This is the second post of my Bit fields series; describing how to not use bit fields, how to use them, the limitations imposed by architecture and the compiler’s implementation, the use of volatile, and finally a show-stopper as well as a proposal to fix it.
Specification
Below, I’ve repeated the documentation for the Line Control Register on the National Semiconductor 8250/16×50 Universal Asynchronous Receiver/Transmitter (UART) from my previous post:
Bit | Notes | |||
---|---|---|---|---|
7 | Divisor Latch Access Bit (DLAB) | |||
6 | Set Break Enable (BREAK) | |||
5, 4, 3 | Bit 5 | Bit 4 | Bit 3 | Parity Select |
X | X | 0 | No Parity | |
0 | 0 | 1 | Odd Parity | |
0 | 1 | 1 | Even Parity | |
1 | 0 | 1 | Mark (1) | |
1 | 1 | 1 | Space (0) | |
2 | Bit 2 | Stop Bits | ||
0 | 1 Stop Bit | |||
1 | 1.5 Stop Bits (5 Bits) or 2 Stop Bits (6-8 Bits) | |||
1, 0 | Bit 1 | Bit 0 | Word Length | |
0 | 0 | 5 Bits | ||
0 | 1 | 6 Bits | ||
1 | 0 | 7 Bits | ||
1 | 1 | 8 Bits |
Definitions
Here are some (to me) easier-to-read C/C++ definitions (note that K&R’s first edition didn’t define either enum
or bool
; these have been added later):
// // UART/8250/LCR.h // // These are the definitions for the Line Control Register (LCR) of the // National Semiconductor 8250 UART and its derivatives (16x50 etc.) // #ifndef UART_8250_LCR_h #define UART_8250_LCR_h enum WordLengths { Word5 = 0b00, Word6 = 0b01, Word7 = 0b10, Word8 = 0b11, Baudot = Word5, // 5-bit Baudot encodings (ITA-1) Murray = Word5 // 5-bit Murray encodings (ITA-2) }; // WordLengths enum StopBits { StopBits1 = 0b0, StopBits2 = 0b1, StopBits1_5 = StopBits2 // For use by 5-bit encodings }; // StopBits enum Parities { ParityNone = 0b000, ParityOdd = 0b001, ParityEven = 0b011, ParityMark = 0b101, ParitySpace = 0b111 }; // Parities struct LCR { enum WordLengths wordLength : 2; // Length of one word enum StopBits stopBits : 1; // Number stop bit times enum Parities parity : 3; // Parity bool breakEnable : 1; // Transmit Break? bool dLAB : 1; // Divisor Latch Access Bit }; // LCR #endif // UART_8250_LCR_h
Usage
To me, the above enum
s are easier to read than the previous #define
s. They are defining names more than values—the values are (almost) incidental. To hide those even more, with enum
s you can avoid explicitly giving values if it starts with 0
, or is one greater than the previous value. I’ve left them in to be explicit that they have a particular value to match the specification—but note that the defined values do not factor in the bit position of the field in the struct
.
It’s the final struct
that shows the fields, their sequence, their types, their names, and their widths. These are all information for the compiler, so that it can work out how to access the different names. It knows it needs to do masks, shifts, ANDs and ORs to implement what the programmer wrote:
// Create and initialise an LCR value struct LCR lcr = { .wordLength = Word8, .stopBits = StopBits1, .parity = ParityNone }; // lcr lcr.wordLength = Word7; // Look ma! No & or |! Ain't the compiler clever?
Surely this is much clearer than before?
Extension
And for the Clock Divisor Register (CDR) example from my previous post, the definitions are even simpler:
// // Clock/CDR.h // #ifndef Clock_CDR_h #define Clock_CDR_h enum Divisors { Divisor1 = 0b00, Divisor2 = 0b01, Divisor4 = 0b10, Divisor8 = 0b11 }; // Divisors struct CDR { Divisors uart1 : 2; Divisors uart2 : 2; Divisors spi : 2; Divisors i2c : 2; }; // CDR #endif // Clock_CDR_h
With the usage being even more straightforward:
struct CDR cdr = { }; cdr.spi = Divisor4; // Set CDR for SPI to ÷4 // Ma, no << either!
Complications
One issue with struct
bit fields over #define
s is that the latter results in simple numerical values, while struct
s are… well, struct
s. That means that they’re not able to be accessed as their “behind the scenes” type, which is often required to access the actual hardware register.
But this is easily accommodated by wrapping the entire struct
with a union
, and overlaying it with a different representation as well. For LCR, that would be a byte
:
union LCR { struct { // Anonymous, to access individual fields WordLengths wordLength : 2; // Length of one word StopBits stopBits : 1; // Number stop bit times Parities parity : 3; // Parity bool breakEnable : 1; // Transmit Break? bool dLAB : 1; // Divisor Latch Access Bit (baud rate) }; // struct byte lcr; // Access to whole byte }; // LCR
Note a few things:
- It’s now the
union
that’s calledLCR
, not thestruct
; - Indeed, the internal
struct
is completely anonymous! That means that its fields are “promoted” to its container (in this case, theunion
); - The overlaid value has a name too, so can be accessed as a complete entity.
The above means that the following accesses are legal:
// After above initialisation of lcr... lcr.wordLength = Word8; // Update Word Length field // For original IBM PC outb(UART1_LCR, lcr.lcr); // Output byte to UART1's LCR port // For other architectures uart1.lcr.lcr = lcr.lcr; // Assign to actual UART's LCR
Limitations
While the above shows why (in my opinion) they’re better, the next post describes their limitations.
Comments are welcome. I suggest that generic comments on the whole “Bit fields” series and concepts go on the main page, while comments specific to this sub-page are written here.