Bit fields: Without

If you didn’t know about struct bit fields (or if you just don’t like them), how would you solve the problem?

This is the first post of my Bit fields series; describing how to not use bit fields, how to use them, the limitations imposed by architecture and the compiler’s implementation, the use of volatile, and finally a show-stopper as well as a proposal to fix it.


Specification

Consider the following documentation for the Line Control Register on the National Semiconductor 8250/16×50 Universal Asynchronous Receiver/Transmitter (UART):

Line Control Register (LCR)
Bit Notes
7 Divisor Latch Access Bit (DLAB)
6 Set Break Enable (BREAK)
5, 4, 3 Bit 5 Bit 4 Bit 3 Parity Select
X X 0 No Parity
0 0 1 Odd Parity
0 1 1 Even Parity
1 0 1 Mark (1)
1 1 1 Space (0)
2 Bit 2 Stop Bits
0 1 Stop Bit
1 1.5 Stop Bits (5 Bits) or 2 Stop Bits (6-8 Bits)
1, 0 Bit 1 Bit 0 Word Length
0 0 5 Bits
0 1 6 Bits
1 0 7 Bits
1 1 8 Bits

I chose this real-world example for a number of reasons:

  1. It’s quite small;
  2. It has both one-bit and multi-bit fields;
  3. One of the field’s interpretations (Stop Bits) is dependent on another field (Word Length);
  4. Not all bit patterns are used (there are “dead” encodings);
  5. The register is used for more than just setting the character format (DLAB and BREAK fields are for other functions).

Definitions

The following code uses #defines for each of the definitions. Note I used “binary literal” syntax (an extension to the language) to highlight the bit values:

//
// UART/8250/LCR.h
//
// These are the definitions for the Line Control Register (LCR) of the
// National Semiconductor 8250 UART and its derivatives (16x50 etc.)
//

#ifndef UART_8250_LCR_h
#define UART_8250_LCR_h

#define LCR_WORD_MASK   0b00000011 // Use to clear field first
#define LCR_WORD_5      0b00000000
#define LCR_WORD_6      0b00000001
#define LCR_WORD_7      0b00000010
#define LCR_WORD_8      0b00000011
#define LCR_WORD_BAUDOT LCR_WORD_5 // Baudot code (ITA-1), not ASCII
#define LCR_WORD_MURRAY LCR_WORD_5 // (also used for Murray code, ITA-2)

#define LCR_STOP_MASK 0b00000100
#define LCR_STOP_1    0b00000000
#define LCR_STOP_2    0b00000100 // When Word Length is 6-8 bits
#define LCR_STOP_1_5  LCR_STOP_2 // When Word Length is 5 bits

#define LCR_PARITY_MASK  0b00111000 // Use to clear field first
#define LCR_PARITY_NONE  0b00000000
#define LCR_PARITY_ODD   0b00001000
#define LCR_PARITY_EVEN  0b00011000
#define LCR_PARITY_MARK  0b00101000
#define LCR_PARITY_SPACE 0b00111000

#define LCR_BREAK_MASK    0b01000000
#define LCR_BREAK_DISABLE 0b00000000
#define LCR_BREAK_ENABLE  0b01000000

#define LCR_DLAB_MASK    0b10000000
#define LCR_DLAB_DISABLE 0b00000000
#define LCR_DLAB_ENABLE  0b10000000
#endif // UART_8250_LCR_h

Usage

The above definitions have the advantage that it is easy to form up a complete value for the LCR by simply ORing together the appropriate values:

// Useful typedef. It does assume an 8-bit-byte architecture!
typedef unsigned char byte;

// The most common RS-232 character format:
// No parity, 8-bit ASCII, with one stop bit
static const byte N81 = (LCR_PARITY_NONE | LCR_WORD_8 | LCR_STOP_1);

// A really old teletype format:
// No parity, 5-bit Baudot, one and a half stop bit times
static const byte Baudot = (LCR_PARITY_NONE | LCR_WORD_5 | LCR_STOP_1_5);

byte LCR_GetEncoding(byte parity, byte wordLength, byte stopBits) {
   return (byte)(parity | wordLength | stopBits);
} // LCR_GetEncoding(parity, wordLength, stopBits)

Complications

That is all well and good for easy initial configuration of the LCR—as long as the programmer doesn’t try to OR in multiple values from the same field-group (which the above naming convention highlights)!

But what if the compound value needs to have one of the fields modified? For example, say the word length needs to be changed from 8 bits to 7 bits? Then the old value needs to be masked out before the new value can be ORed in:

lcr = (byte)((lcr & ~LCR_WORD_MASK) | LCR_WORD_7);

Forgetting to mask out the old value (don’t forget the ~ operator!) is a common error when dealing with bit fields. Indeed, helper macros are often defined to avoid errors like that.

Note that sometimes it isn’t necessary to do both the & masking and the | setting: if the bit field value desired is all 1s or all 0s for the bit field, then the former or latter (respectively) can be omitted. This is not recommended however! If a later update to the code changes either the definition or the required value, then this “optimisation” will end up being an error. Best allow the compiler to determine that an operation is not required by examining the values, rather than being clever at the moment of writing the code.

Extension

Another commonly used #define construct is applicable when the same field set may be applicable to multiple fields within the one struct. For this contrived example, assume a register that controls the clock divisor input to a number of peripherals. All values are the same, since it’s the position that dictates which peripheral is being configured:

Clock Divisor Register (CDR)
Bit Notes
7, 6 I²C clock divisor
Bit 7 Bit 6 Divisor
0 0 MCU clock ÷1
0 1 MCU clock ÷2
1 0 MCU clock ÷4
1 1 MCU clock ÷8
5, 4 SPI clock divisor
Bit 5 Bit 4 Divisor
0 0 MCU clock ÷1
0 1 MCU clock ÷2
1 0 MCU clock ÷4
1 1 MCU clock ÷8
3, 2 UART2 clock divisor
Bit 3 Bit 2 Divisor
0 0 MCU clock ÷1
0 1 MCU clock ÷2
1 0 MCU clock ÷4
1 1 MCU clock ÷8
1, 0 UART1 clock divisor
Bit 1 Bit 0 Divisor
0 0 MCU clock ÷1
0 1 MCU clock ÷2
1 0 MCU clock ÷4
1 1 MCU clock ÷8

This specification could easily be implemented as previously, with repetitive values for each of the divisors, but subtly different names.

An alternative would be to define the clock divisor values just once, and a shift value for each of the different peripherals:

//
// Clock/CDR.h
//
// These are the definitions for a Clock Divisor Register (CDR)
// for a contrived MCU.
//

#ifndef Clock_CDR_h
#define Clock_CDR_h

#define CDR_DIVISOR_MASK 0b11
#define CDR_DIVISOR_1    0b00
#define CDR_DIVISOR_2    0b01
#define CDR_DIVISOR_4    0b10
#define CDR_DIVISOR_8    0b11

#define CDR_SHIFT_I2C   6
#define CDR_SHIFT_SPI   4
#define CDR_SHIFT_UART2 2
#define CDR_SHIFT_UART1 0

#endif // Clock_CDR_h

Notice how the divisor values are correct for their field, but need to be shifted into position by the CDR_SHIFT_XXX values, as so:

// Set CDR for SPI to ÷4
cdr = (byte)((cdr & (CDR_DIVISOR_MASK << CDR_SHIFT_SPI)) |
                    (CDR_DIVISOR_4 << CDR_SHIFT_SPI));

This saves repetition, but makes for some ugly code. Again, macros are usually used to help this.

A better(?) alternative

Most of the ugly code is generic—but the compiler already has implementations of exactly this to hide it all; if only it was invoked. That’s the subject of the next post: using struct bit fields.


Comments are welcome. I suggest that generic comments on the whole “Bit fields” series and concepts go on the main page, while comments specific to this sub-page are written here.