Bit fields: With

This is the second post of my Bit fields series; describing how to not use bit fields, how to use them, the limitations imposed by architecture and the compiler’s implementation, the use of volatile, and finally a show-stopper as well as a proposal to fix it.

Specification

Below, I’ve repeated the documentation for the Line Control Register on the National Semiconductor 8250/16×50 Universal Asynchronous Receiver/Transmitter (UART) from my previous post:

**Line Control Register (LCR)**
Bit	Notes
7	Divisor Latch Access Bit (DLAB)
6	Set Break Enable (BREAK)
5, 4, 3	Bit 5	Bit 4	Bit 3	Parity Select
	X	X	0	No Parity
	0	0	1	Odd Parity
	0	1	1	Even Parity
	1	0	1	Mark (1)
	1	1	1	Space (0)
2	Bit 2	Stop Bits
	0	1 Stop Bit
	1	1.5 Stop Bits (5 Bits) or 2 Stop Bits (6-8 Bits)
1, 0	Bit 1	Bit 0	Word Length
	0	0	5 Bits
	0	1	6 Bits
	1	0	7 Bits
	1	1	8 Bits

Definitions

Here are some (to me) easier-to-read C/C++ definitions (note that K&R’s first edition didn’t define either enum or bool; these have been added later):

//
// UART/8250/LCR.h
//
// These are the definitions for the Line Control Register (LCR) of the
// National Semiconductor 8250 UART and its derivatives (16x50 etc.)
//

#ifndef UART_8250_LCR_h
#define UART_8250_LCR_h

enum WordLengths {
   Word5  = 0b00,
   Word6  = 0b01,
   Word7  = 0b10,
   Word8  = 0b11,
   Baudot = Word5,   // 5-bit Baudot encodings (ITA-1)
   Murray = Word5    // 5-bit Murray encodings (ITA-2)
}; // WordLengths

enum StopBits {
   StopBits1   = 0b0,
   StopBits2   = 0b1,
   StopBits1_5 = StopBits2 // For use by 5-bit encodings
}; // StopBits

enum Parities {
   ParityNone  = 0b000,
   ParityOdd   = 0b001,
   ParityEven  = 0b011,
   ParityMark  = 0b101,
   ParitySpace = 0b111
}; // Parities

struct LCR {
   enum WordLengths wordLength  : 2; // Length of one word
   enum StopBits    stopBits    : 1; // Number stop bit times
   enum Parities    parity      : 3; // Parity
        bool        breakEnable : 1; // Transmit Break?
        bool        dLAB        : 1; // Divisor Latch Access Bit
}; // LCR

#endif // UART_8250_LCR_h

Usage

To me, the above enums are easier to read than the previous #defines. They are defining names more than values—the values are (almost) incidental. To hide those even more, with enums you can avoid explicitly giving values if it starts with 0, or is one greater than the previous value. I’ve left them in to be explicit that they have a particular value to match the specification—but note that the defined values do not factor in the bit position of the field in the struct.

It’s the final struct that shows the fields, their sequence, their types, their names, and their widths. These are all information for the compiler, so that it can work out how to access the different names. It knows it needs to do masks, shifts, ANDs and ORs to implement what the programmer wrote:

// Create and initialise an LCR value
struct LCR lcr = {
    .wordLength = Word8,
    .stopBits   = StopBits1,
    .parity     = ParityNone
}; // lcr

lcr.wordLength = Word7; // Look ma! No & or |! Ain't the compiler clever?

Surely this is much clearer than before?

Extension

And for the Clock Divisor Register (CDR) example from my previous post, the definitions are even simpler:

//
// Clock/CDR.h
//

#ifndef Clock_CDR_h
#define Clock_CDR_h

enum Divisors {
   Divisor1 = 0b00,
   Divisor2 = 0b01,
   Divisor4 = 0b10,
   Divisor8 = 0b11
}; // Divisors

struct CDR {
   Divisors uart1 : 2;
   Divisors uart2 : 2;
   Divisors spi   : 2;
   Divisors i2c   : 2;
}; // CDR

#endif // Clock_CDR_h

With the usage being even more straightforward:

struct CDR cdr = { };

cdr.spi = Divisor4; // Set CDR for SPI to ÷4 // Ma, no << either!

Complications

One issue with struct bit fields over #defines is that the latter results in simple numerical values, while structs are… well, structs. That means that they’re not able to be accessed as their “behind the scenes” type, which is often required to access the actual hardware register.

But this is easily accommodated by wrapping the entire struct with a union, and overlaying it with a different representation as well. For LCR, that would be a byte:

union LCR {
   struct {  // Anonymous, to access individual fields
      WordLengths wordLength  : 2; // Length of one word
      StopBits    stopBits    : 1; // Number stop bit times
      Parities    parity      : 3; // Parity
      bool        breakEnable : 1; // Transmit Break?
      bool        dLAB        : 1; // Divisor Latch Access Bit (baud rate)
   }; // struct
   byte lcr; // Access to whole byte
}; // LCR

Note a few things:

It’s now the union that’s called LCR, not the struct;
Indeed, the internal struct is completely anonymous! That means that its fields are “promoted” to its container (in this case, the union);
The overlaid value has a name too, so can be accessed as a complete entity.

The above means that the following accesses are legal:

// After above initialisation of lcr...
lcr.wordLength = Word8;  // Update Word Length field

// For original IBM PC
outb(UART1_LCR, lcr.lcr); // Output byte to UART1's LCR port

// For other architectures
uart1.lcr.lcr = lcr.lcr; // Assign to actual UART's LCR

Limitations

While the above shows why (in my opinion) they’re better, the next post describes their limitations.

Comments are welcome. I suggest that generic comments on the whole “Bit fields” series and concepts go on the main page, while comments specific to this sub-page are written here.