This is the fifth post of my Bit fields series; describing how to not use bit fields, how to use them, the limitations imposed by architecture and the compiler’s implementation, the use of volatile, and finally a show-stopper as well as a proposal to fix it.
volatile
Which brings up the most important aspect when writing code to access hardware peripherals—and bit fields are not immune. Imagine an analogue-to-digital converter peripheral that has a single byte register interface. To start the conversion the top bit is set to 1
, and the conversion is finished when that bit goes back to 0
. Then the rest of the field is the 7-bit digitised value of the analogue signal.
You’ll note that a non-bit-field implementation of the code for the above allows a byte write of
0x80
to start the process, and a succession of byte reads of the peripheral value until the top bit is not set—upon which the byte value that was just read is already the result.
Definition
Using a struct
bit field (and assuming an LSb compiler, and that byte
has already been defined), the definition is simply:
struct ADC { byte value : 7; // Digital value of Analog signal (after 'running') bool running : 1; // Set true to start conversion; is false when done }; // ADC
Naïve implementation
Naïve code to implement an acquisition could be as follows:
extern ADC adc; // Address set by the linker byte AtoD() { adc.running = true; while (adc.running) { } // while return adc.value; } // AtoD()
And this code will probably work: as long as the compiler is using zero optimisations. A clever compiler will see that the programmer first set a field to true
, and then tested it for true
immediately after: thus would simply code an infinite loop! (And hopefully warn that the return
statement was inaccessible…)
Solution
Of course, this example is exactly why K&R allowed a variable to be declared as volatile
—a signal to the compiler that the compiler must not expect the value of the variable to remain the same. By simply changing the above definition for adc
to:
extern volatile ADC adc;
the compiler will no longer optimise any accesses to any field. It won’t “remember” that adc.running
is true
, and thus re-read the value every time.
Alternative
Note that instead of declaring the whole adc
variable as volatile
, it is also legal to specify just the running
field as volatile
. This would have an identical result for the above code—but might make the compiler change the layout of the struct
. By declaring just one field as volatile
, the compiler may decide that it needs to start a whole new int
to accommodate it, to avoid a potential clash with the non-volatile value
. Also note, placing the volatile
keyword against a field would make every instance of that struct
(including temporary variables) require volatile handling for the field, rather than that one special hardware register.
Consequence
Read‑Modify‑Write [RMW]
Reference the LCR example in the original bit field example. Assume that the hardware register is accessed through the following variable definition:
extern volatile LCR lcr; // Address set by the linker
Now visualise what sequence of instructions the compiler has to write to implement the following code:
lcr.parity = NoParity; lcr.wordLength = Word8; lcr.stopBits = StopBits1;
Part of defining any variable volatile
is that as well as making the compiler re-read the value every time it is referenced (it cannot “cache” the value for reuse later), it also demands that any write to the variable be executed immediately: the assumption is that writes are just as sensitive as reads, and so should not be optimised either.
This concept of
volatile
flagging that a variable is both readvolatile
as well as writevolatile
is (in my opinion) a shortcoming of the language. While there are some hardware registers that may require repeated writes when the subsequent writes are apparently identical, this is frequently not the case. Perhaps I’ll write another language proposal to address this one too.
Inefficiency
The end result of this consequence is that the above code requires three distinct Read‑Modify‑Write [RMW] operations rather than a Read, three Modifies (which could collapse to a single compound Modify) and a final Write. Given that the nugatory middle Reads and Writes (being accesses to hardware rather than normal memory) may also slow down the system, it is worthwhile to instead write the above code as:
LCR temp = lcr; // Perform just the Read from hardware temp.parity = NoParity; // Now update the temp.wordLength = Word8; // necessary fields while temp.stopBits = StopBits1; // avoiding volatile lcr = temp; // Finally perform the write back to hardware
Sadly,
g++
won’t accept this code as-is: it complains about the twostruct
assignments “losing” thevolatile
specification. Type-castinglcr
toLCR &
in both cases fixes this, while making the code less readable.
The above code makes the RMW explicit, at the expense of a temporary variable (which the compiler may keep in a register anyway).
Indeed, the optimised code that gcc
produces is effectively:
Read lcr from hardware into a register Mask off the parity, wordLength and stopBits fields (AND with 0b1100'0000) Set the required bits (OR in 0b0000'0011) Write the register back to lcr
Four instructions instead of nine, with only two accesses to lcr
instead of six. This also shows how a smart compiler can really optimise the use of struct
bit fields (when volatile
doesn’t get in the way) by knowing what values are being used. It might be possible for the compiler to realise the same optimisations if explicit &
and |
code had been used (especially if macros were invoked!), but it would be a lot harder.
Compiler signalling
What gcc
does
In my previous post on compiler implementation of struct
bit fields, I mentioned that the programmer may want to signal to the compiler that the structure and access requirements of the field are significant: that the underlying hardware will not tolerate a loose interpretation of accesses to the fields. In other words, to always use a fixed access size rather than the most optimum for the current circumstance.
With gcc
(and g++
) this signal is twofold:
- Declare the
struct
bit field asvolatile
; - Use the compiler flag
‑fstrict‑volatile‑bitfields
.
The latter means (from the documentation):
This option should be used if accesses to volatile bit-fields (or other structure fields, although the compiler usually honors those types anyway) should use a single access of the width of the field’s type, aligned to a natural alignment if possible. For example, targets with memory-mapped peripheral registers might require all such accesses to be 16 bits wide; with this flag you can declare all peripheral bit-fields as
unsigned short
(assuming short is 16 bits on these targets) to force GCC to use 16-bit accesses instead of, perhaps, a more efficient 32-bit access.If this option is disabled, the compiler uses the most efficient instruction. In the previous example, that might be a 32-bit load instruction, even though that accesses bytes that do not contain any portion of the bit-field, or memory-mapped registers unrelated to the one being updated.
Problem
But unfortunately even this is not a universal panacæa. If the whole struct
is small enough to be accessed as a byte
, then even though the bit field is declared as an int
the compiler will still use the size of the struct
, not the field, to access the field—again breaking the hardware requirements. To overcome this, it is important to add padding to increase the size of the struct
to the required width. For example, if hardware implements the previous LCR example as a 32-bit-access register, the definition needs to be extended as follows:
struct LCR { enum WordLengths wordLength : 2; // Length of one word enum StopBits stopBits : 1; // Number stop bit times enum Parities parity : 3; // Parity bool breakEnable : 1; // Transmit Break? bool dLAB : 1; // Divisor Latch Access Bit int : 24; // Padding to make struct 32 bits // int : 0; // Pity this doesn't do it instead! }; // LCR
Note the commented-out line. According to K&R:
Fields need not be named; unnamed fields (a colon and width only) are used for padding. The special width
0
may be used to force alignment at the nextint
boundary.
gcc
has apparently interpreted this to mean “start the next field at an int
boundary”. While this is a valid interpretation of the brief sentence above, it means that in the absence of a “next” field then nothing will happen to the struct
. A more useful, and importantly both a valid and backward compatible interpretation of a 0
width, would be “pad the current struct
to the next boundary of this field’s type”. That interpretation would allow the above : 24
padding to be replaced by the commented-out line, and importantly leave the counting of bits to the compiler instead of the error-prone programmer.
Note that I have raised this as a gcc
issue—with the above : 0
padding suggestion as a follow-up enhancement idea.
Yet worse
All of the issues until now have had workarounds, either through careful choice of definition or perhaps writing non-intuitive code to fight inefficiency (note that the intuitive code does work, but is not necessarily the best it can be).
But there’s one scenario with struct
bit fields that is a show-stopper.
Comments are welcome. I suggest that generic comments on the whole “Bit fields” series and concepts go on the main page, while comments specific to this sub-page are written here.