Bit fields: Volatile

This is the fifth post of my Bit fields series; describing how to not use bit fields, how to use them, the limitations imposed by architecture and the compiler’s implementation, the use of volatile, and finally a show-stopper as well as a proposal to fix it.

`volatile`

Which brings up the most important aspect when writing code to access hardware peripherals—and bit fields are not immune. Imagine an analogue-to-digital converter peripheral that has a single byte register interface. To start the conversion the top bit is set to 1, and the conversion is finished when that bit goes back to 0. Then the rest of the field is the 7-bit digitised value of the analogue signal.

You’ll note that a non-bit-field implementation of the code for the above allows a byte write of 0x80 to start the process, and a succession of byte reads of the peripheral value until the top bit is not set—upon which the byte value that was just read is already the result.

Definition

Using a struct bit field (and assuming an LSb compiler, and that byte has already been defined), the definition is simply:

struct ADC {
   byte value   : 7; // Digital value of Analog signal (after 'running')
   bool running : 1; // Set true to start conversion; is false when done
}; // ADC

Naïve implementation

Naïve code to implement an acquisition could be as follows:

extern ADC adc; // Address set by the linker

byte AtoD() {
   adc.running = true;
   while (adc.running) {
   } // while
   return adc.value;
} // AtoD()

And this code will probably work: as long as the compiler is using zero optimisations. A clever compiler will see that the programmer first set a field to true, and then tested it for true immediately after: thus would simply code an infinite loop! (And hopefully warn that the return statement was inaccessible…)

Solution

Of course, this example is exactly why K&R allowed a variable to be declared as volatile—a signal to the compiler that the compiler must not expect the value of the variable to remain the same. By simply changing the above definition for adc to:

extern volatile ADC adc;

the compiler will no longer optimise any accesses to any field. It won’t “remember” that adc.running is true, and thus re-read the value every time.

Alternative

Note that instead of declaring the whole adc variable as volatile, it is also legal to specify just the running field as volatile. This would have an identical result for the above code—but might make the compiler change the layout of the struct. By declaring just one field as volatile, the compiler may decide that it needs to start a whole new int to accommodate it, to avoid a potential clash with the non-volatile value. Also note, placing the volatile keyword against a field would make every instance of that struct (including temporary variables) require volatile handling for the field, rather than that one special hardware register.

Consequence

Read‑Modify‑Write [RMW]

Reference the LCR example in the original bit field example. Assume that the hardware register is accessed through the following variable definition:

extern volatile LCR lcr; // Address set by the linker

Now visualise what sequence of instructions the compiler has to write to implement the following code:

lcr.parity     = NoParity;
lcr.wordLength = Word8;
lcr.stopBits   = StopBits1;

Part of defining any variable volatile is that as well as making the compiler re-read the value every time it is referenced (it cannot “cache” the value for reuse later), it also demands that any write to the variable be executed immediately: the assumption is that writes are just as sensitive as reads, and so should not be optimised either.

This concept of volatile flagging that a variable is both read volatile as well as write volatile is (in my opinion) a shortcoming of the language. While there are some hardware registers that may require repeated writes when the subsequent writes are apparently identical, this is frequently not the case. Perhaps I’ll write another language proposal to address this one too.

Inefficiency

The end result of this consequence is that the above code requires three distinct Read‑Modify‑Write [RMW] operations rather than a Read, three Modifies (which could collapse to a single compound Modify) and a final Write. Given that the nugatory middle Reads and Writes (being accesses to hardware rather than normal memory) may also slow down the system, it is worthwhile to instead write the above code as:

LCR temp        = lcr;       // Perform just the Read from hardware
temp.parity     = NoParity;  // Now update the
temp.wordLength = Word8;     // necessary fields while
temp.stopBits   = StopBits1; // avoiding volatile
lcr             = temp;      // Finally perform the write back to hardware

Sadly, g++ won’t accept this code as-is: it complains about the two struct assignments “losing” the volatile specification. Type-casting lcr to LCR & in both cases fixes this, while making the code less readable.

The above code makes the RMW explicit, at the expense of a temporary variable (which the compiler may keep in a register anyway).

Indeed, the optimised code that gcc produces is effectively:

Read lcr from hardware into a register
Mask off the parity, wordLength and stopBits fields (AND with 0b1100'0000)
Set the required bits (OR in 0b0000'0011)
Write the register back to lcr

Four instructions instead of nine, with only two accesses to lcr instead of six. This also shows how a smart compiler can really optimise the use of struct bit fields (when volatile doesn’t get in the way) by knowing what values are being used. It might be possible for the compiler to realise the same optimisations if explicit & and | code had been used (especially if macros were invoked!), but it would be a lot harder.

Compiler signalling

What `gcc` does

In my previous post on compiler implementation of struct bit fields, I mentioned that the programmer may want to signal to the compiler that the structure and access requirements of the field are significant: that the underlying hardware will not tolerate a loose interpretation of accesses to the fields. In other words, to always use a fixed access size rather than the most optimum for the current circumstance.

With gcc (and g++) this signal is twofold:

Declare the struct bit field as volatile;
Use the compiler flag ‑fstrict‑volatile‑bitfields.

The latter means (from the documentation):

This option should be used if accesses to volatile bit-fields (or other structure fields, although the compiler usually honors those types anyway) should use a single access of the width of the field’s type, aligned to a natural alignment if possible. For example, targets with memory-mapped peripheral registers might require all such accesses to be 16 bits wide; with this flag you can declare all peripheral bit-fields as unsigned short (assuming short is 16 bits on these targets) to force GCC to use 16-bit accesses instead of, perhaps, a more efficient 32-bit access.

If this option is disabled, the compiler uses the most efficient instruction. In the previous example, that might be a 32-bit load instruction, even though that accesses bytes that do not contain any portion of the bit-field, or memory-mapped registers unrelated to the one being updated.

Problem

But unfortunately even this is not a universal panacæa. If the whole struct is small enough to be accessed as a byte, then even though the bit field is declared as an int the compiler will still use the size of the struct, not the field, to access the field—again breaking the hardware requirements. To overcome this, it is important to add padding to increase the size of the struct to the required width. For example, if hardware implements the previous LCR example as a 32-bit-access register, the definition needs to be extended as follows:

struct LCR {
   enum WordLengths wordLength :  2; // Length of one word
   enum StopBits stopBits      :  1; // Number stop bit times
   enum Parities parity        :  3; // Parity
        bool breakEnable       :  1; // Transmit Break?
        bool dLAB              :  1; // Divisor Latch Access Bit
        int                    : 24; // Padding to make struct 32 bits
//      int                    :  0; // Pity this doesn't do it instead!
}; // LCR

Note the commented-out line. According to K&R:

Fields need not be named; unnamed fields (a colon and width only) are used for padding. The special width 0 may be used to force alignment at the next int boundary.

gcc has apparently interpreted this to mean “start the next field at an int boundary”. While this is a valid interpretation of the brief sentence above, it means that in the absence of a “next” field then nothing will happen to the struct. A more useful, and importantly both a valid and backward compatible interpretation of a 0 width, would be “pad the current struct to the next boundary of this field’s type”. That interpretation would allow the above : 24 padding to be replaced by the commented-out line, and importantly leave the counting of bits to the compiler instead of the error-prone programmer.

Note that I have raised this as a gcc issue—with the above : 0 padding suggestion as a follow-up enhancement idea.

Yet worse

All of the issues until now have had workarounds, either through careful choice of definition or perhaps writing non-intuitive code to fight inefficiency (note that the intuitive code does work, but is not necessarily the best it can be).

But there’s one scenario with struct bit fields that is a show-stopper.

Comments are welcome. I suggest that generic comments on the whole “Bit fields” series and concepts go on the main page, while comments specific to this sub-page are written here.