I recently ran into an embedded C program using the '#pragma align' directive:
/*
* Audio buffers
*/
#pragma align(4)
static uint32_t RxBuffer1[NUM_AUDIO_SAMPLES];
#pragma align(4)
static uint32_t RxBuffer2[NUM_AUDIO_SAMPLES];
#pragma align(4)
static uint32_t TxBuffer1[NUM_AUDIO_SAMPLES];
#pragma align(4)
static uint32_t TxBuffer2[NUM_AUDIO_SAMPLES];
Note that this code excerpt is for a DSP chip, thus it's not x86-64.
After doing some research, this appears to be a method for aligning variables in memory at a specified distance. For example, it would allow me to align three char vars at 1 byte intervals as opposed to placing them in the typical memory word width (e.g. 4 byte intervals). I understand that there are some penalties involved with storing variables at non-word intervals. This is due to the fact that memory is retrieved as words, thus it would be necessary to do shifting and masking if you were trying to just look at individual bytes.
However, I'm confused with how '#pragma align' is actually implemented. So my primary question: how does it work?
I'm hoping to get some comments regarding the following items:
- Is the '#pragma align' directive a common thing? Or is it dependent on the environment you're working in (i.e. does #pragma align exist for x86).
- Why is this a preprocessor directive? Why is the preprocessor responsible for this?
- What goes on behind the scenes when I later want to reference one of these oddly aligned variables? What does it reference to be able to know that 'variable x is byte 3 of memory word 0x1ABA9'.
Edit: I'm just now realizing that the #pragma directive is intended for machine specific compilers, thus the answer to my question may be heavily influenced by the environment I'm working in. To give you more information, I'm working with an Analog Devices Blackfin+ processor. A link to that chip is provided here.
Although it begins with #, #pragma is not a preprocessor directive, instead it is handled by the compiler.
Pragma directives are compiler-specific, so the specifics of how they work depend on the compiler.
It is not standard: C++11 uses the alignas specifier to achieve this. Older compilers have alternatives (such as MSVC _declspec(align(4))), and continue to support these for compatibility with existing source code.
That said, where supported #pragma align is reasonably similar between compilers, and works in exactly the way you describe, individually specifying the alignment of data types and members of structures. It certainly exists for all common x86 compilers.
As to how it is implemented, that is compiler specific. But in effect the compiler must tag the internal metadata for the type with its alignment requirement so that the correct machine code can be generated, and offsets to struct members calculated correctly, sizeof and pointer arithmetic works, and so forth. Each data type has a size and an alignment requirement anyway, and each member has an offset, so for a pragma to change them just involves changing what information the front-end sends to the back-end.
Related
I would like to develop a device-driver on linux(written in C) and a user-space library wrapping all functions provided by my device-driver (also written in C). Just to make it more clear, my library wil provide the following methods:
int myOpen();
void myClose();
mySetConf(MyconfStruct conf)
etc.
The function will use the file associated to my device-driver, in particular:
myOpen will call the open() of my device-driver
myClose will call the close() of my device-driver
mySetConf will call the ioctl() of my device driver and pass the myConfStruct as a parameter to configure the device-driver using ioctl().
assume myConfStruct is a simple structure containing something like this:
typedef struct {
uint16_t var1;
uint8_t var2;
} myConfStruct;
I would like the myConfStruct to be a structure shared between both my user-application (library) and my kernel-driver using a single header.
Are there any best-practice while doing this?
I would like to have the structure defined into only one file, having it defined in multiple files seems to be quite error-prone if i plan on changing it in the future, but I understood that I should not include <linux/types.h> inside my user files and I shouldn't use <stdint.h> inside my device-driver.
So another question is also, how can I define the interface between a module and the user-application so that who is implementing the application is not forced to include any linux header?
What you are creating is a character device. The kernel documentation includes a specific section, the Linux driver implementer's guide, you should also read. Specifically, the ioctl based interfaces section, which also describes some of the considerations necessary (regarding alignment and 64-bit fields).
As to header files, see KernelHeaders article at kernelnewbies.org.
I would like to have the structure defined into only one file, having it defined in multiple files seems to be quite error-prone if i plan on changing it in the future.
No. You do specify the headers in two separate files: one for use in-kernel, and the other for use by userspace.
Kernel-userspace interface should be stable. You should take care to design your data structure so that you can extend it if necessary; preferably by adding some padding reserved for future use and required to be initialized to zero, and/or a version number at the beginning of the structure. Later versions must support all previous versions of the structure as well. Even if it is only a "toy" or "experimental" device driver, it is best to learn to do it right from the get go. This stuff is much, much harder to learn to "add afterwards"; I'm talking from deep experience here.
As a character device, you should also be prepared for the driver to be compiled on other architectures besides the one you are developing on. Even byte order ("endianness") can vary, although all Linux architectures are currently either ILP32 or LP64.
Also remember that there are several hardware architectures, including x86-64, that support both 64-bit and 32-bit userspace. So, even if you believe your driver will ever be used on x86-64, you cannot really assume the userspace is 64-bit (and not 32-bit). Look at existing code to see how it is done right; I recommend using e.g. bootlin's elixir to browse the Linux kernel sources.
Kernel-side header file should use __s8, __u8, __s16, __u16, __s32, __u32, __s64, or __u64. For pointers, use __u64, and u64_to_user_ptr().
Userspace-side header file should use <stdint.h> types (int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, uint64_t) and uint64_t for pointers. Use a cast via (uintptr_t) for the conversion, i.e. ptr = (void *)(uintptr_t)u64; and u64 = (uintptr_t)ptr.
Ensure all members are naturally aligned. This means that an N-bit member is preceded by k×N bits of other members, where k is either zero or a positive integer. Thus, if your structure needs an unsigned and a signed 8-bit integer (one for version, one for foo), an 16-bit signed integer (bar), and a 64-bit unsigned integer (baz), considering the version should be first, you'll probably want
struct kernel_side {
__u8 version;
__s8 foo;
__u16 bar;
__u32 padding;
__u64 baz;
};
struct userspace_side {
uint8_t version;
int8_t foo;
uint16_t bar;
uint32_t padding;
uint64_t baz;
};
You can also have character arrays and such, but do note that a single ioctl data block is limited to 8191 bytes or less in length.
If you spend some time designing your interface structures, you'll find that careful design will avoid annoying issues like compat_ support (making them just simple wrappers). Personally, I end up creating a test version with a userspace test program to see what works best, and only then decide on the data structures.
Are these attributes incompatible? The address attribute seems to be ignored, emitting no warnings (-Wall).
(For reference, EEMEM is defined in eeprom.h as: #define EEMEM __attribute__((section(".eeprom"))).)
Using a declaration like:
uint8_t storedFlags EEMEM __attribute__((address (100)));
(and similarly for all the others) results in the variables being placed in whatever order the linker prefers, ignoring my attribute. Order of attributes doesn't make a difference.
I am aware of the preferred method (creating sections and passing their locations to the linker). I was just looking to shove them around for the moment, as I'm in active development and adding and removing allocations in EEPROM; I'd rather things not move around every other build so I don't have to reprogram EEPROM from default values every damn time. Worst of all, I'm sure I've done precisely this before, and had it work. Version differences? Coincidental assignments? (I have GCC 3.4 and 8.1, not sure what that project used; I'm using 8.1 for this one.)
The documentation for the address attribute states:
Variables with the address attribute are used to address memory-mapped peripherals that may lie outside the io address range.
Looking at the AVR memory space shows the I/O addresses fall under the SRAM data memory space.
This explains why your construct doesn't work as expected since EEMEM and the address attribute map to conflicting memory sections.
Edit: Testing with avr-gcc 3.6.2 suggest that the section attribute overrides the address attribute (without warning). Using eeprom_read_byte to read data from EEPROM, the following example gets correctly compiled by avr-gcc (correct because the address 0x0123 is passed to the eeprom_read_byte function):
#include <avr/eeprom.h>
uint8_t __attribute__((address (0x0123))) storedFlags;
int main(void){
if (eeprom_read_byte(&storedFlags) == 1){
return 1;
}
}
Edit2: tested on avr-gcc 11.1, also generates correct instructions.
Given a CPU architecture, is the exact binary form of a struct determined exactly?
For example, struct stat64 is used by glibc and the Linux kernel. I see glibc define it in sysdeps/unix/sysv/linux/x86/bits/stat.h as:
struct stat64 {
__dev_t st_dev; /* Device. */
# ifdef __x86_64__
__ino64_t st_ino; /* File serial number. */
__nlink_t st_nlink; /* Link count. */
/* ... et cetera ... */
}
My kernel was compiled already. Now when I compile new code using this definition, they have binary compatibility. Where is this guaranteed? The only guarantees I know of are:
The first element has offset 0
Elements declared later have higher offsets
So if the kernel code declares struct stat64 in the exact same way (in the C code), then I know that the binary form has:
st_dev # offset 0
st_ino # offset at least sizeof(__dev_t)
But I'm not currently aware of any way to determine the offset of st_ino. Kernighan & Ritchie give the simple example
struct X {
char c;
int i;
}
where on my x86-64 machine, offsetof(struct X, i) == 4. Perhaps there are some general alignment rules that determine the exact binary form of a struct for each CPU architecture?
Given a CPU architecture, is the exact binary form of a struct determined exactly?
No, the representation or layout (“binary form”) of a structure is ultimately determined by the C implementation, not by the CPU architecture. Most C implementations intended for normal purposes follow recommendations provided by the manufacturer and/or the operating system. However, there may be circumstances where, for example, a certain alignment for a particular type might give slightly better performance but is not required, and so one C implementation might choose to require that alignment while another does not, and this can result in different structure layout.
In addition, a C implementation might be designed for special purposes, such as providing compatibility with legacy code, in which case it might choose to replicate the alignment of some old compiler for another architecture rather than to use the alignment required by the target processor.
However, let’s consider structures in separate compilations using one C implementation. Then C 2018 6.2.7 1 says:
… Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are completed anywhere within their respective translation units, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types; if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; and if one member of the pair is declared with a name, the other is declared with the same name. For two structures, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same widths…
Therefore, if two structures are declared identically in separate translation units, or with the minor variations permitted in that passage, then they are compatible, which effectively means they have the same layout or representation.
Technically, that passage applies only to separate translation units of the same program. The C standard defines behaviors for one program; it does not explicitly define interactions between programs (or fragments of programs, such as kernel extensions) and the operating system, although to some extent you might consider the operating system and everything running in it as one program. However, for practical purposes, it applies to everything compiled with that C implementation.
This means that as long as you use the same C implementation as the kernel is compiled with, identically declared structures will have the same representation.
Another consideration is that we might use different compilers for compiling the kernel and compiling programs. The kernel might be compiled with Clang while a user prefers to use GCC. In this case, it is a matter for the compilers to document their behaviors. The C standard does not guarantee compatibility, but the compilers can, if they choose to, perhaps by both documenting that they adhere to a particular Application Binary Interface (ABI).
Also note that a “C implementation” as discussed above is not just a particular compiler but a particular compiler with particular switches. Various switches may change how a compiler behaves in ways that cause to be effectively a different C implementation, such as switches to conform to one version of the C standard or another, switches affecting whether structures are packed, switches affecting sizes of integer types, and so on.
I often times write to memory mapped I/O pins like this
P3OUT |= BIT1;
I assumed that P3OUT was being replaced with something like this by my preprocessor:
*((unsigned short *) 0x0222u)
But I dug into an H file today and saw something along these lines:
volatile unsigned short P3OUT # 0x0222u;
There's some more expansion going on before that, but it is generally that. A symbol '#' is being used. Above that there are some #pragma's about using an extended set of the C language. I am assuming this is some sort of directive to the linker and effectively a symbol is being defined as being at that location in the memory map.
Was my assumption right for what happens most of the time on most compilers? Does it matter one way or the other? Where did that # notation come from, is it some sort of standard?
I am using IAR Embedded workbench.
This question is similar to this one: How to place a variable at a given absolute address in memory (with GCC).
It matches what I assumed my compiler was doing anyway.
Although an expression like (unsigned char *)0x1234 will, on many compilers, yield a pointer to hardware address 0x1234, nothing in the standard requires any particular relationship between an integer which is cast to a pointer and the resulting address. The only thing which the standard specifies is that if a particular integer type is at least as large as intptr_t, and casting a pointer to that particular type yields some value, then casting that particular value back to the original pointer type will yield a pointer equivalent to the original.
The IAR compiler offers a non-standard extension which allows the compiler to request that variables be placed at specified hard-coded addresses. This offers some advantages compared to using macros to create pointer expressions. For one thing, it ensures that such variables will be regarded syntactically as variables; while pointer-kludge expressions will generally be interpreted correctly when used in legitimate code, it's possible for illegitimate code which should fail with a compile-time error to compile but produce something other than the desired effect. Further, the IAR syntax defines symbols which are available to the linker and may thus be used within assembly-language modules. By contrast, a .H file which defines pointer-kludge macros will not be usable within an assembly-language module; any hardware which will be used in both C and assembly code will need to have its address specified in two separate places.
The short answer to the question in your title is "differently". What's worse is that compilers from different vendors for the same target processor will use different approaches. This one
volatile unsigned short P3OUT # 0x0222u;
Is a common way to place a variable at a fixed address. But you will also see it used to identify individual bits within a memory mapped location = especially for microcontrollers which have bit-wide instructions like the PIC families.
These are things that the C Standard does not address, and should IMHO, as small embedded microcontrollers will eventually end up being the main market for C (yes, I know the kernel is written in C, but a lot of user-space stuff is moving to C++).
I actually joined the C committee to try and drive for changes in this area, but my sponsorship went away and it's a very expensive hobby.
A similar area is declaring a function to be an ISR.
This document shows one of the approaches we considered
Which is the best way of using constants in CUDA?
One way is to define constants in constant memory, like:
// CUDA global constants
__constant__ int M;
int main(void)
{
...
cudaMemcpyToSymbol("M", &M, sizeof(M));
...
}
An alterative way would be to use the C preprocessor:
#define M = ...
I would think defining constants with the C preprocessor is much faster. Which are then the benefits of using the constant memory on a CUDA device?
constants that are known at compile time should be defined using
preprocessor macros (e.g. #define) or via C/C++ const variables at global/file scope.
Usage of __constant__ memory may be beneficial for programs who use certain values that don't change for the duration of the kernel and for which certain access patterns are present (e.g. all threads access the same value at the same time). This is not better or faster than constants that satisfy the requirements of item 1 above.
If the number of choices to be made by a program are relatively small in number, and these choices affect kernel execution, one possible approach for additional compile-time optimization would be to use templated code/kernels
Regular C/C++ style constants: In CUDA C (itself a modification of C99) constants are absolute compile time entities. This is hardly surprising given the amount of optimization that happens in NVCC is VERY involved given the nature of GPU processing.
#define: macros are as always very inelegant but useful in a pinch.
The __constant__ variable specifier is, however a completely new animal and something of a misnomer in my opinion. I will put down what Nvidia has here in the space below:
The __constant__ qualifier, optionally used together with
__device__, declares a variable that:
Resides in constant memory space,
Has the lifetime of an application,
Is accessible from all the threads within the grid and from the host through the runtime library (cudaGetSymbolAddress() /
cudaGetSymbolSize() / cudaMemcpyToSymbol() / cudaMemcpyFromSymbol()).
Nvidia's documentation specifies that __constant__ is available at register level speed (near-zero latency) provided it is the same constant being accessed by all threads of a warp.
They are declared at global scope in CUDA code. HOWEVER based on personal (and currently ongoing) experience you have to be careful with this specifier when it comes to separate compilation, like separating your CUDA code (.cu and .cuh files) from your C/C++ code by putting wrapper functions in C-style headers.
Unlike traditional "constant" specified variables however these are initialized at runtime fromthe host code that allocates device memory and ultimately launches the kernel. I repeat I am currently working code that demonstrates these can be set at runtime using cudaMemcpyToSymbol() before kernel execution.
They are quite handy to say the least given the L1 cache level speed that is guaranteed for access.