Using constants with CUDA - c

Which is the best way of using constants in CUDA?
One way is to define constants in constant memory, like:
// CUDA global constants
__constant__ int M;
int main(void)
{
...
cudaMemcpyToSymbol("M", &M, sizeof(M));
...
}
An alterative way would be to use the C preprocessor:
#define M = ...
I would think defining constants with the C preprocessor is much faster. Which are then the benefits of using the constant memory on a CUDA device?

constants that are known at compile time should be defined using
preprocessor macros (e.g. #define) or via C/C++ const variables at global/file scope.
Usage of __constant__ memory may be beneficial for programs who use certain values that don't change for the duration of the kernel and for which certain access patterns are present (e.g. all threads access the same value at the same time). This is not better or faster than constants that satisfy the requirements of item 1 above.
If the number of choices to be made by a program are relatively small in number, and these choices affect kernel execution, one possible approach for additional compile-time optimization would be to use templated code/kernels

Regular C/C++ style constants: In CUDA C (itself a modification of C99) constants are absolute compile time entities. This is hardly surprising given the amount of optimization that happens in NVCC is VERY involved given the nature of GPU processing.
#define: macros are as always very inelegant but useful in a pinch.
The __constant__ variable specifier is, however a completely new animal and something of a misnomer in my opinion. I will put down what Nvidia has here in the space below:
The __constant__ qualifier, optionally used together with
__device__, declares a variable that:
Resides in constant memory space,
Has the lifetime of an application,
Is accessible from all the threads within the grid and from the host through the runtime library (cudaGetSymbolAddress() /
cudaGetSymbolSize() / cudaMemcpyToSymbol() / cudaMemcpyFromSymbol()).
Nvidia's documentation specifies that __constant__ is available at register level speed (near-zero latency) provided it is the same constant being accessed by all threads of a warp.
They are declared at global scope in CUDA code. HOWEVER based on personal (and currently ongoing) experience you have to be careful with this specifier when it comes to separate compilation, like separating your CUDA code (.cu and .cuh files) from your C/C++ code by putting wrapper functions in C-style headers.
Unlike traditional "constant" specified variables however these are initialized at runtime fromthe host code that allocates device memory and ultimately launches the kernel. I repeat I am currently working code that demonstrates these can be set at runtime using cudaMemcpyToSymbol() before kernel execution.
They are quite handy to say the least given the L1 cache level speed that is guaranteed for access.

Related

How can I hide the contents of a user-exposed C preprocessor definition in non-user code?

In my C89 code, I have several units implementing a variety of abstract buffers which are to be treated by the user as if they were classes. That is, there is a public header defining the interfacing functions, and this is all the user ever sees. They are not intended to (need to) know what is going on behind the scenes.
However, at buffer creation, a raw byte-buffer is passed to the creation function, so the user must be able to know how much raw buffer space to allocate at compile time. This requires knowing how much space one item takes up in each abstract type. We are coding for a very limited embedded environment.
Currently, each buffer type has a private header in which a struct defines the format of the data. It is simple to add a macro for the size of the data element:
#define MY_ELEMENT_SIZE (sizeof(component_1_type) + sizeof(component_2_type))
However, component_x_type is intended to be hidden from the user, so this definition cannot go in the public header with the prototypes for the interfacing functions.
Our next idea was to have a const variable in the source:
const int MY_ELEMENT_SIZE = sizeof(component_1_type) + sizeof(component_2_type);
and an extern declaration in the public header:
extern const int MY_ELEMENT_SIZE;
But, because this is C89 and we have pedantry and MISRA and other requirements to fulfill, we cannot use variable-length arrays. In a "user" source file, to get a 50-element raw buffer, we write:
char rawBuffer[50 * MY_ELEMENT_SIZE] = {0u};
Using the extern const... method, this results in the compilation error:
error: variably modified ‘rawBuffer’ at file scope
This was not totally unexpected, but is disappointing in that sizeof(any_type) is genuinely constant and known at compile time.
Please advise me on how to expose the size of the data element in the public header without making the existence of component_x_type known to the user, in such a way that it can be used as an array length in C89.
Many, many thanks.
In my C89 code
It is 2020 now. Discuss with your manager or client the opportunity to use a less obsolete C standard. In practice, most hand-written C89 code can be reasonably ported to C11, and you could use, buy or develop code refactoring tools -or services- helping you with that (e.g. your GCC plugin). Remind to your manager or client that technical debt has a lot of cost (probably dozen of thousands of US$ or €). Notice that old C89 compilers are in practice optimizing much less than recent ones, and that most junior developers (your future colleagues) are not even familiar with C89 (so they would need some kind of training, which costs a lot).
How can I hide the contents of a user-exposed C preprocessor definition in non-user code?
As far as I know, you cannot (in theory). Check by reading the C11 standard n1570. Read also the documentation of GNU cpp then of GCC (or of your C compiler).
we have pedantry and MISRA and other requirements to fulfill
Be aware that these requirements have costs. Remind these costs to your client or manager.
(about hiding the content of a user-exposed C preprocessor #define)
However, in practice, a C code (e.g. inside some internal header file #include-d in your translation unit) can be generated, and this is common practice (look into GNU bison or SWIG for a well known example of C code generator, and also consider using GNU m4 or gpp or your own Guile or Python script, or your own C or C++ program emitting C code). You simply have to configure your build automation infrastructure (e.g. write your Makefile) for such a case.
If you have some script or utility generating things like #define MACRO_7oa7eIzzcxv03Tm (where MACRO_7oa7eIzzcxv03Tm is some pseudo-random or name mangled identifier) then the probability of an accidental collision with client code is quite small. A human programmer is very unlikely to think of such identifiers, and with enough care a C generating script usually won't emit identifiers colliding with that. See also this answer.
Perhaps your client or manager allows you to use (on your desktop) some generator of such "random-looking" identifier. AFAIK, they are MISRA compatible (but my MISRA standard is at office, and I am -may 2020- currently Covid19 confined at home, near Paris, France).
we cannot use variable-length arrays.
You could (with approval from manager and client) consider using struct-s with flexible array members or else use arrays of dimension 0 or 1 as the last member of your struct-s. IIRC, that was common practice in SunOS3.2
Consider also using tools like Frama-C, Clang static analyzer, or -at end of 2020- my Bismon coupled with a recent GCC. Think of subcontracting the code review of your source code.
Additional to the other answers, this is a quite primitive proposal. But it is easy to understand.
Since presumably you will not publish your header files too often to you clients, and so will not change the sizes of the types, you can use a (manually or automatically) calculated definition:
#define OUR_LIB_TYPE_X_SIZE 23
In your private sources you can then check the correctness of this assumption for example by
typedef char assert_type_x_has_size[2 * (sizeof (TypeX) == OUR_LIB_TYPE_X_SIZE) - 1];
It will error on any decent compiler on unequal sizes, because the array's size will be -1 and illegal. On equal sizes, the array's size is 1 and legal.
Because you're just defining a type, no code or memory is allocated. You might need to mark this as "unused" for some compilers or code checkers.
I've encountered this very problem too - unfortunately private encapsulation also makes the object size encapsulated. Sometimes it is sufficient to simply return the object size through a getter function, but not always.
I solved it exactly as KamilCuk showed in comments: give the caller a raw "magic number" through a #define in the .h file, then a static assert inside the .c implementation checking that the define is consistent with the object size.
If that's not elegant enough, then perhaps you could consider outsourcing the size allocation to a run-time API from the "class":
uint8_t* component1_get_raw_buffer (size_t n);
Where you return a pointer to a statically allocated buffer inside the encapsulated "class". The caller code would then have to be changed to:
uint8_t* raw_buffer;
raw_buffer = component1_get_raw_buffer(50);
This involves some internal trickery keeping track of how much memory that's allocated (and error handling - maybe return NULL on failure). You will to reserve a fixed maximum size for the internal static buffer, to cover the worst use-case scenario.
(Optionally: const qualify the returned pointer if the user isn't supposed to modify the data)
Advantages are: better OO design, no heap allocation, remain MISRA-C compliant. Disadvantages are function call overhead during initialization and the need to set aside "enough" memory in advance.
Also, this method isn't very safe in a multi-threading environment, but that's not usually an issue in embedded systems.

How do most embedded C compilers define symbols for memory mapped I/O?

I often times write to memory mapped I/O pins like this
P3OUT |= BIT1;
I assumed that P3OUT was being replaced with something like this by my preprocessor:
*((unsigned short *) 0x0222u)
But I dug into an H file today and saw something along these lines:
volatile unsigned short P3OUT # 0x0222u;
There's some more expansion going on before that, but it is generally that. A symbol '#' is being used. Above that there are some #pragma's about using an extended set of the C language. I am assuming this is some sort of directive to the linker and effectively a symbol is being defined as being at that location in the memory map.
Was my assumption right for what happens most of the time on most compilers? Does it matter one way or the other? Where did that # notation come from, is it some sort of standard?
I am using IAR Embedded workbench.
This question is similar to this one: How to place a variable at a given absolute address in memory (with GCC).
It matches what I assumed my compiler was doing anyway.
Although an expression like (unsigned char *)0x1234 will, on many compilers, yield a pointer to hardware address 0x1234, nothing in the standard requires any particular relationship between an integer which is cast to a pointer and the resulting address. The only thing which the standard specifies is that if a particular integer type is at least as large as intptr_t, and casting a pointer to that particular type yields some value, then casting that particular value back to the original pointer type will yield a pointer equivalent to the original.
The IAR compiler offers a non-standard extension which allows the compiler to request that variables be placed at specified hard-coded addresses. This offers some advantages compared to using macros to create pointer expressions. For one thing, it ensures that such variables will be regarded syntactically as variables; while pointer-kludge expressions will generally be interpreted correctly when used in legitimate code, it's possible for illegitimate code which should fail with a compile-time error to compile but produce something other than the desired effect. Further, the IAR syntax defines symbols which are available to the linker and may thus be used within assembly-language modules. By contrast, a .H file which defines pointer-kludge macros will not be usable within an assembly-language module; any hardware which will be used in both C and assembly code will need to have its address specified in two separate places.
The short answer to the question in your title is "differently". What's worse is that compilers from different vendors for the same target processor will use different approaches. This one
volatile unsigned short P3OUT # 0x0222u;
Is a common way to place a variable at a fixed address. But you will also see it used to identify individual bits within a memory mapped location = especially for microcontrollers which have bit-wide instructions like the PIC families.
These are things that the C Standard does not address, and should IMHO, as small embedded microcontrollers will eventually end up being the main market for C (yes, I know the kernel is written in C, but a lot of user-space stuff is moving to C++).
I actually joined the C committee to try and drive for changes in this area, but my sponsorship went away and it's a very expensive hobby.
A similar area is declaring a function to be an ISR.
This document shows one of the approaches we considered

Best practice on writing constant parameters for embedded systems

This is a case of "static const” vs “#define” in C" for embedded systems.
On large/mid projects with "passed-down" code and modules, what is the best practice on writing constant parameters for your include files, modules, etc?
In a code "passed-down" where you don't know if the names you're choosing are defined in some other included file or might be called with extern or as macros in some other file that might include your file.
Having these 3 options:
static const int char_height = 12;
#define CHAR_HEIGHT 12
enum { char_height = 12 };
which one would be better (on an embedded system with unknown memory constraints)?
The original code uses mainly #define's for this, but these kind of constants are haphazardly implemented in several ways (and at different locations even in the same files) since it seems several people developed this demo software for a certain device.
Specifically, this is a demo code, showing off every hardware and SDK feature of a certain device.
Most of the data I'm thinking about is the kind used to configure the environment: screen dimensions, charset characteristics, something to improve the readability of the code. Not on the automatic configuration a compiler and pre-processor could do. But since there's a lot of code in there and I'm afraid of global name conflicts, I'm reluctant to use #define's
Currently, I'm considering that it would be better to rewrite the project from scratch and re-implement most of the already written functions to get their constants from just one c file or reorganize the constants' implementation to just one style.
But:
This is a one person project (so it would take a lot of time to re-implement everything)
The already implemented code works and it has been revised several times. (If it's not broken...)
Always consider readability and memory constraints. Also, macros are simply copy/paste operations that occur before compilation. With that being said I like to do the following:
I define all variables that are constant as being static const if they are to be used in one c file (e.g. not globally accessible across multiple files). Anything defined as const shall be placed in ROM when at file scope. Obviously you cannot change these variables after they're initialized.
I define all constant values using #define.
I use enumerations where it adds to readability. Any place where you have a fixed range of values I prefer enumerations to explicitly state the intent.
Try to approach the project with an object oriented perspective (even though c isn't OO). Hide private functions (don't create a prototype in the header), do not use globals if you can avoid it, mark variables that should only reside in one c module (file) as static, etc.
They are 3 different things that should be used in 3 different situations.
#define should be used for constants that need to be evaluated at compile time. One typical example is the size of a statically allocated array, i.e.
#define N 10
int x[N];
It is also fine to use #define all constants where it doesn't matter how or where the constant is allocated. People who claim that it is bad practice to do so only voice their own, personal, subjective opinions.
But of course, for such cases you can also use const variables. There is no important difference between #define and const, except for the following cases:
const should be used where it matters at what memory address a constant is allocated. It should also be used for variables that the programmer will likely change often. Because if you used const, you an easily move the variable to a memory segment in EEPROM or data flash (but if you do so, you need to declare it as volatile).
Another slight advantage of const is that you get stronger type safety than a #define. For the #define to get equal type safety, you have to add explicit type casts in the macro, which might get a bit harder to read.
And then of course, since consts (and enums) are variables, you can reduce their scope with the static keyword. This is good practice since such variables do not clutter down the global namespace. Although the true source of name conflicts in the global namespaces are in 99% of all cases caused by poor naming policies, or no naming policies at all. If you follow no coding standard, then that is the true source of the problem.
So generally it is fine to make constants global when needed, it is rather harmless practice as long as you have a sane naming policy (preferably all items belonging to one code module should share the same naming prefix). This shouldn't be confused with the practice of making regular variables global, which is always a very bad idea.
Enums should only be used when you have several constant values that are related to each other and you want to create a special type, such as:
typedef enum
{
OK,
ERROR_SOMETHING,
ERROR_SOMETHING_ELSE
} error_t;
One advantage of the enum is that you can use a classic trick to get the number of enumerated items as another compile-time constant "free of charge":
typedef enum
{
OK,
ERROR_SOMETHING,
ERROR_SOMETHING_ELSE,
ERRORS_N // the number of constants in this enum
} error_t;
But there are various pitfalls with enums, so they should always be used with caution.
The major disadvantage of enum is that it isn't type safe, nor is it "type sane". First of all, enumeration constants (like OK in the above example) are always of the type int, which is signed.
The enumerated type itself (error_t in my example) can however be of any type compatible with char or int, signed or unsigned. Take a guess, it is implementation-defined and non-portable. Therefore you should avoid enums, particularly as part of various data byte mappings or as part of arithmetic operations.
I agree with bblincoe...+1
I wonder if you understand what the differences are in that syntax and how it can/might affect implementation. Some folks may not care about implementation but if you are moving into embedded perhaps you should.
When bblincoe mentions ROM instead of RAM.
static const int char_height = 12;
That should, ideally, consume .text real estate and pre-init that real estate with the value you specified. Being const you wont change it but it does have a placeholder? now why would you need a placeholder for a constant? think about that, certainly you could hack the binary down the road for some reason to turn something on or off or change a board specific tuning parameter...
Without a volatile though that doesnt mean that compiler has to always use that .text location, it can optimize and put that value in as instructions directly or even worse optimize math operations and remove some math.
The define and enum do not consume storage, they are constants that the compiler chooses how to implement, ultimately those bits if they are not optimized away, land somewhere in .text sometimes everywhere in .text, depends on the instruction set how its immediates work the specific constant, etc.
So define vs enum is basically do you want to pick all the values or do you want the compiler to pick some values for you, define if you want to control it enum if you want the compiler to choose the values.
So it really isnt a best practice thing at all it is a case of determining what your program needs to do and choosing the appropriate programming solution for that situation.
Depending on the compiler and the target processor, choosing volatile static const int vs not doing that can affect the rom consumption. But it is a very specific optimization, and not a general answer (and has nothing to do with embedded but with compiling in general).
Dan Saks explains why he prefers the enumeration constant in these articles, Symbolic Constants and Enumeration Constants vs Constant Objects. In summary, avoid macros because they don't observe the usual scope rules and the symbolic names are typically not preserved for symbolic debuggers. And prefer enumeration constants because they are not susceptible to a performance penalty that may affect constant objects. There is a lot more details in the linked articles.
Another thing to considerer is performance. A #define constant can usually be accessed faster than a const variable (for integers) since the const will need to be fetched from ROM (or RAM) and the #define value will usually be an immediate instruction argument so it is fetched along with the instruction (no extra cycles).
As for naming conflicts, I like to use prefixes like MOD_OPT_ where MOD is the module name OPT means that the define is a compile-time option, etc. Also only include the #defines in your header files if they're part of the public API, otherwise use an .inc file if they're needed in multiple source files or define them in the source file itself if they're only specific to that file.

Purpose of the ATOMIC_INIT macro in the Linux kernel

I'm reading the Linux Device Drivers 3rd Edition book online and I'm having trouble understanding the initialization macro for atomic variables:
static atomic_t foobar = ATOMIC_INIT(1);
I've looked through the source code for the Linux kernel v3.2, but I've only come up with two definitions:
#define ATOMIC_INIT(i) { (i) }
and
#define ATOMIC_INIT(i) ((atomic_t) { (i) })
The second version of the definition for the macro seems to be functionally the same as the first -- in fact, it seems redundant to even have an explicit cast when the value would be implicitly cast anyway to atomic_t. Why are there two versions of the definition?
Is the purpose of the ATOMIC_INIT macro just to keep code from breaking if the atomic_t structure changes in a future release of the Linux kernel?
Many atomic operations must be implemented separately for each architecture.
The purpose of the various macros and functions in atomic.h is to hide the differences between architectures.
In practice, all architectures use a single 32-bit variable to implement atomic_t, so there is no practical difference in the various ATOMIC_INIT macros; all the interesting stuff happens in the operations.
But the internals might change (and did change once for 32-bit SPARC), so you always should use the offical API.
The difference between the two different forms of ATOMIC_INIT is that the first can only be used in initializations, the second can be used in initializations and assignments. At a first glance this sounds as if the second would be preferable, but it has an important use case where it can't be applied: block scope variables that are declared with static storage specification. In block scope
static atomic_t foobar = ((atomic_t) { (1) });
would be invalid for standard C, because the initializer would not be a compile time constant expression. (In file scope the compound literal would be statically allocated so it would work, there.)
I remember vaguely a discussion on the kernel list that mentioned that gcc has an extension that allows such code, and that this is one of the reasons they don't move on to C99 but stick to gnu89 as a C dialect.

Prettiest way to declare a C array either fixed size or variable size?

I am writing a small C code for an algorithm. The main target are embedded microcontrollers, however, for testing purposes, a Matlab/Python interface is required.
I am following an embedded programming standard (MISRA-C 2004), which requires the use of C90, and discourage the use of malloc and friends. Therefore, all the arrays in the code have their memory allocated at compile time. If you change the size of the input arrays, you need to recompile the code, which is alright in the microcontroller scenario.
However, when prototyping with Matlab/Python, the size of the input arrays change rather often, and recompiling every time does not seem like an option. In this case, the use of C99 is acceptable, and the size of the arrays should be determined in runtime.
The question is: what options do I have in C to make these two scenarios coexist in the same code, while keeping the code clean?
I must emphasize that my main concern is how to make the code easy to maintain. I have considered using #ifdef to either take the statically allocated array or the dynamically alocated array. But there are too many arrays, I think #ifdef makes the code look ugly.
I've thought of a way that you can get away with only one #ifdef. I would personally just bite the bullet and recompile my code when I need to. The idea of using a different dialect of C for production and test makes me a bit nervous.
Anyway, here's what you can do.
#ifdef EMBEDDED
#define ARRAY_SIZE(V,S) (S)
#else
#define ARRAY_SIZE(V,S) (V)
#endif
int myFunc(int n)
{
int myArray[ARRAY_SIZE(n, 6)];
// work with myArray
}
The ARRAY_SIZE macro chooses the variable V, if not in the embedded environment; or the fixed size S, if in the embedded environment.
MISRA-C:2004 forbids C99 and thereby VLAs, so if you are writing strictly-conforming MISRA code you can't use them. It is also very likely that VLAs will be explicitly banned in the upcoming MISRA-C standard.
Is it an option not to use statically allocated arrays of unknown size? That is:
uint8_t arr[] = { ... };
...
n = sizeof(arr)/sizeof(uint8_t);
This is most likely the "prettiest" way. Alternatively you can have a debug build in C99 with VLAs, and then change it to statically allocated arrays in the release build.

Resources