Can static/dynamic linker cause LLVM global-object aliasing? - linker

I'm working on code which analyses LLVM IR using LLVM's C++ API, and I'm trying to figure out when two or more llvm::GlobalObject values might end up naming the same piece of memory. I need to never reach a false-negative conclusion.
AFAIK, llvm::Module::aliases() enumerates the module's explicit aliases, which is a start. My concern is that additional aliasing might be introduced later, in the following ways:
Consider this example:
#A = common global i32 0, align 4
#B = external global i32
#C = external dllimport global i32, align 4
...
I'm concerned that the static linker may have the freedom to bind #A and #B to the same piece of storage. And/or that the dynamic linker might bind #A and #C to the same piece of storage.
Does anyone know if these are realistic possibilities? And am I missing any other ways that post-compilation steps could cause two GlobalObject's to alias?

Related

Implicitly declare all variables volatile

Does gcc have an option to disable read/write optimizations for global variables not explicitly defined as volatile?
My team is running out of program memory in our embedded C project, built using gcc. When I enable optimizations to reduce code size, the code no longer works as expected because we have not been using the volatile keyword where we ought to have been. That is, I was able to resolve the presenting problem by declaring a few variables accessed in ISRs volatile. However, I don't have any level of certainty that those are the only variables I need to declare volatile and I just haven't noticed the other bugs yet.
I have heard that "some compilers" have a flag to implicitly declare everything volatile, but that I should resist the temptation because it is a "substitute for thought" (see https://barrgroup.com/Embedded-Systems/How-To/C-Volatile-Keyword).
Yes, but thought is expensive. Feel free to try to talk me out of it in the comments section, but I'm hoping for a quick fix to get the code size down without breaking the application.
You mean something besides -O0?
I expect that it's not too difficult to hack GCC for a quick experiment. This place in grokdeclarator in gcc/c/c-decl.c is probably a good place to unconditionally inject a volatile qualifier for the interesting storage_class values (probably csc_none, csc_extern, csc_static in your case).
/* It's a variable. */
/* An uninitialized decl with `extern' is a reference. */
int extern_ref = !initialized && storage_class == csc_extern;
type = c_build_qualified_type (type, type_quals, orig_qual_type,
orig_qual_indirect);
The experiment should tell you whether this is feasible from a performance/code size point of view and if it is, you might want to submit this as a proper upstream patch.
It's possible to do it by for example redefining all basic types as the same but with volatile specifier. Anyway if all variables in your code will be volatile I expect that size of your application will be larger than before optimizations.
My solution is: enable optimizations for part of the code. If your application got some functional architecture you can start enabling optimizations and test if this functionality works properly. It's much easier than optimize everything and analyze why nothing works.

How do labels and dd declarations work in NASM? What's the C equivalent?

I'm trying to understand what'd be the C equivalent of some nasm idioms like these ones:
%define CONSTANT1 1
%define CONSTANT2 2
1) section name_section data align=N
v1: dd 1.2345678
v2: dd 0x12345678
v3: dd 32767
v4:
v5: dd 1.0
v6:
dd 1.0, 2.0, 3.0, 4.0,
dd 5.0, 6.0, 7.0, 8.0
2) section name_section bss align=N
v7:
resd 1
3) global _function_name#0
section name_section code align=N
_function_name#0:
...
4) global _g_structure1
global _g_structure2
section name_section data align=N
_g_structure1:
dw 01h
dw 2
_g_structure2:
dd CONSTANT1
dd CONSTANT2
5) section section_name code align=N
function_name:
...
The nasm documentation here and here didn't clarify too much. Guess my questions are:
How dd and similars are interpreted?
It seems you can declare N sections of type {code, bss, data} with X bytes alignment, what's the meaning of that in C?
There are functions with the #N suffix, what's the meaning of that?
global... you declare global labels? in what scope? nasm files?
v4: is empty, what does that mean?
dd stores a sequence of DWORDS given by the arguments. So dd 1 will store the 4-byte value 0x00000001 at the current location (since it's targeting a little endian architecture, you'll end up with the bytes 0x01 0x00 0x00 0x00).
Sections aren't generally exposed directly in C - it's more of a lower level concern handled by compilers, linkers and runtime loaders. So in general your toolchain will handle the proper allocation of your code and data into sections. For example, the compiler will put the actual assembled code into .text sections, and will put statically initialized data into .data sections, and finally will put uninitialized or zero-initialized statically allocated data into .bss sections, and so on. The details aren't really part of C itself and will vary by platform and executable format (for example, not all platforms have the same types of sections).
When using assembly, on the other hand, you need to be a bit more aware of sections. For example, if you have mutable data it is important that it ends up a different section than your code, since you don't want to run into read-only .text sections, or self-modifying-code false positives, etc.
The section alignment is a directive to the runtime loader that tells it the minimum required alignment for the section. You can impact this in your C code using some compiler or platform specific options - e.g. if you request a statically allocated array to have an alignment of 32, then the .data section may be promoted to at least 32-byte alignment. C doesn't have a standard way to actually request such alignment, but you can use platform specific extensions such as posix_memalign, gcc's aligned attribute, or even #pragma pack. C++11 on the other hand has alignas to do this in a standard way.
The #N suffix is a result of stdcall name mangling.
You can declare global labels with the help of the GLOBAL directive in nasm. As Peter point out, this only modifies the attributes of a subsequently declared label, and doesn't actually declare the label itself (which is still done in the usual way). This directive has other format-specific options which let you, for example, declare your exported symbol as a function.
The NASM global label directive does not actually declare label. It just modifies what scope it will have when you do declare it, with label:.
It's the opposite of C, where global is the default and you have to use static to get non-exported symbols that are private to this compilation unit.
v4: is empty, what does that mean?
Think of labels as zero-width pointers. The label itself has no size, it just labels that position in the binary. (And you can have multiple labels at the same location).
NASM has no types, so it's really quite similar to void*.

How to use subsections in a c program?

Code that belongs to the same section but different subsections has its order of placement defined by the subsection number. I need to use this feature in a c program - i.e. I need two functions to be in the same section and in a particular order. GCC re-orders functions in the same section as it pleases, so that is why I need subsections. Here is the syntax for sections - I can't figure out how to specify subsections using the __attribute__ syntax.
void func1() __attribute__ ((section ("mysection")));
See Jester's comment below for assembly syntax. I am using gcc, so I am assuming gas assembler?
Here is a long explanation of why I have gotten to the point of needing subsections. Maybe one of my conclusions along the way was incorrect and you can help me avoid this.
Q: Why not create separate sections and load them contiguously?
A: I have a separate problem where I need to be able to figure out the exact beginning address of my functions ahead of time.
Q: Why do you need to know the address?
A: I want to align some code in my functions (not the function itself) to a particular alignment
Q: Why not use .align?
A: I have found that using .align inside a c function for some reason forces that function itself to be aligned to that value, and I do not want that - so I have come up with an ugly macro alternative to the .align directive:
b 1f
. = . + (1 << #alignment") - (("#section_start" + .) & ((1 << "#alignment") - 1))
1:
Q: Why not use labels to calculate your current location? Or a label in the loader file?
A: Assembler doesn't let me - I have to use the dot operator.
Q: Tell me again why you need section_start here?
A: The dot operator is relative to the start of the section, it is not the absolute address
Q: Why are you trying this low level stuff in C this is dumb
A: I agree this is dumb, but play along.
I can't figure out how to use subsections, but I believe this GCC option forces function order, and I seem to have at least one example where it fixes the ordering in my test. I am slightly concerned about having to set -fno-section-anchors as well (seems like you can't only use -fno-toplevel-reorder), but this might be the best workaround I have right now.
One problem with this approach is that I lose the ability to place each function in separate sections - which has the benefit of allowing me to use the linker script to calculate the end of functions (also useful to me).
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
-fno-toplevel-reorder -fno-section-anchors

How can I share an array between the C files of a library, without the array being visible to the outside? [duplicate]

I am writing a C (shared) library. It started out as a single translation unit, in which I could define a couple of static global variables, to be hidden from external modules.
Now that the library has grown, I want to break the module into a couple of smaller source files. The problem is that now I have two options for the mentioned globals:
Have private copies at each source file and somehow sync their values via function calls - this will get very ugly very fast.
Remove the static definition, so the variables are shared across all translation units using extern - but now application code that is linked against the library can access these globals, if the required declaration is made there.
So, is there a neat way for making private global variable shared across multiple, specific translation units?
You want the visibility attribute extension of GCC.
Practically, something like:
#define MODULE_VISIBILITY __attribute__ ((visibility ("hidden")))
#define PUBLIC_VISIBILITY __attribute__ ((visibility ("default")))
(You probably want to #ifdef the above macros, using some configuration tricks à la autoconfand other autotools; on other systems you would just have empty definitions like #define PUBLIC_VISIBILITY /*empty*/ etc...)
Then, declare a variable:
int module_var MODULE_VISIBILITY;
or a function
void module_function (int) MODULE_VISIBILITY;
Then you can use module_var or call module_function inside your shared library, but not outside.
See also the -fvisibility code generation option of GCC.
BTW, you could also compile your whole library with -Dsomeglobal=alongname3419a6 and use someglobal as usual; to really find it your user would need to pass the same preprocessor definition to the compiler, and you can make the name alongname3419a6 random and improbable enough to make the collision improbable.
PS. This visibility is specific to GCC (and probably to ELF shared libraries such as those on Linux). It won't work without GCC or outside of shared libraries.... so is quite Linux specific (even if some few other systems, perhaps Solaris with GCC, have it). Probably some other compilers (clang from LLVM) might support also that on Linux for shared libraries (not static ones). Actually, the real hiding (to the several compilation units of a single shared library) is done mostly by the linker (because the ELF shared libraries permit that).
The easiest ("old-school") solution is to simply not declare the variable in the intended public header.
Split your libraries header into "header.h" and "header-internal.h", and declare internal stuff in the latter one.
Of course, you should also take care to protect your library-global variable's name so that it doesn't collide with user code; presumably you already have a prefix that you use for the functions for this purpose.
You can also wrap the variable(s) in a struct, to make it cleaner since then only one actual symbol is globally visible.
You can obfuscate things with disguised structs, if you really want to hide the information as best as possible. e.g. in a header file,
struct data_s {
void *v;
};
And somewhere in your source:
struct data_s data;
struct gbs {
// declare all your globals here
} gbss;
and then:
data.v = &gbss;
You can then access all the globals via: ((struct gbs *)data.v)->
I know that this will not be what you literally intended, but you can leave the global variables static and divide them into multiple source files.
Copy the functions that write to the corresponding static variable in the same source file also declared static.
Declare functions that read the static variable so that external source files of the same module can read it's value.
In a way making it less global. If possible, best logic for breaking big files into smaller ones, is to make that decision based on the data.
If it is not possible to do it this way than you can bump all the global variables into one source file as static and access them from the other source files of the module by functions, making it official so if someone is manipulating your global variables at least you know how.
But then it probably is better to use #unwind's method.

Best practice on writing constant parameters for embedded systems

This is a case of "static const” vs “#define” in C" for embedded systems.
On large/mid projects with "passed-down" code and modules, what is the best practice on writing constant parameters for your include files, modules, etc?
In a code "passed-down" where you don't know if the names you're choosing are defined in some other included file or might be called with extern or as macros in some other file that might include your file.
Having these 3 options:
static const int char_height = 12;
#define CHAR_HEIGHT 12
enum { char_height = 12 };
which one would be better (on an embedded system with unknown memory constraints)?
The original code uses mainly #define's for this, but these kind of constants are haphazardly implemented in several ways (and at different locations even in the same files) since it seems several people developed this demo software for a certain device.
Specifically, this is a demo code, showing off every hardware and SDK feature of a certain device.
Most of the data I'm thinking about is the kind used to configure the environment: screen dimensions, charset characteristics, something to improve the readability of the code. Not on the automatic configuration a compiler and pre-processor could do. But since there's a lot of code in there and I'm afraid of global name conflicts, I'm reluctant to use #define's
Currently, I'm considering that it would be better to rewrite the project from scratch and re-implement most of the already written functions to get their constants from just one c file or reorganize the constants' implementation to just one style.
But:
This is a one person project (so it would take a lot of time to re-implement everything)
The already implemented code works and it has been revised several times. (If it's not broken...)
Always consider readability and memory constraints. Also, macros are simply copy/paste operations that occur before compilation. With that being said I like to do the following:
I define all variables that are constant as being static const if they are to be used in one c file (e.g. not globally accessible across multiple files). Anything defined as const shall be placed in ROM when at file scope. Obviously you cannot change these variables after they're initialized.
I define all constant values using #define.
I use enumerations where it adds to readability. Any place where you have a fixed range of values I prefer enumerations to explicitly state the intent.
Try to approach the project with an object oriented perspective (even though c isn't OO). Hide private functions (don't create a prototype in the header), do not use globals if you can avoid it, mark variables that should only reside in one c module (file) as static, etc.
They are 3 different things that should be used in 3 different situations.
#define should be used for constants that need to be evaluated at compile time. One typical example is the size of a statically allocated array, i.e.
#define N 10
int x[N];
It is also fine to use #define all constants where it doesn't matter how or where the constant is allocated. People who claim that it is bad practice to do so only voice their own, personal, subjective opinions.
But of course, for such cases you can also use const variables. There is no important difference between #define and const, except for the following cases:
const should be used where it matters at what memory address a constant is allocated. It should also be used for variables that the programmer will likely change often. Because if you used const, you an easily move the variable to a memory segment in EEPROM or data flash (but if you do so, you need to declare it as volatile).
Another slight advantage of const is that you get stronger type safety than a #define. For the #define to get equal type safety, you have to add explicit type casts in the macro, which might get a bit harder to read.
And then of course, since consts (and enums) are variables, you can reduce their scope with the static keyword. This is good practice since such variables do not clutter down the global namespace. Although the true source of name conflicts in the global namespaces are in 99% of all cases caused by poor naming policies, or no naming policies at all. If you follow no coding standard, then that is the true source of the problem.
So generally it is fine to make constants global when needed, it is rather harmless practice as long as you have a sane naming policy (preferably all items belonging to one code module should share the same naming prefix). This shouldn't be confused with the practice of making regular variables global, which is always a very bad idea.
Enums should only be used when you have several constant values that are related to each other and you want to create a special type, such as:
typedef enum
{
OK,
ERROR_SOMETHING,
ERROR_SOMETHING_ELSE
} error_t;
One advantage of the enum is that you can use a classic trick to get the number of enumerated items as another compile-time constant "free of charge":
typedef enum
{
OK,
ERROR_SOMETHING,
ERROR_SOMETHING_ELSE,
ERRORS_N // the number of constants in this enum
} error_t;
But there are various pitfalls with enums, so they should always be used with caution.
The major disadvantage of enum is that it isn't type safe, nor is it "type sane". First of all, enumeration constants (like OK in the above example) are always of the type int, which is signed.
The enumerated type itself (error_t in my example) can however be of any type compatible with char or int, signed or unsigned. Take a guess, it is implementation-defined and non-portable. Therefore you should avoid enums, particularly as part of various data byte mappings or as part of arithmetic operations.
I agree with bblincoe...+1
I wonder if you understand what the differences are in that syntax and how it can/might affect implementation. Some folks may not care about implementation but if you are moving into embedded perhaps you should.
When bblincoe mentions ROM instead of RAM.
static const int char_height = 12;
That should, ideally, consume .text real estate and pre-init that real estate with the value you specified. Being const you wont change it but it does have a placeholder? now why would you need a placeholder for a constant? think about that, certainly you could hack the binary down the road for some reason to turn something on or off or change a board specific tuning parameter...
Without a volatile though that doesnt mean that compiler has to always use that .text location, it can optimize and put that value in as instructions directly or even worse optimize math operations and remove some math.
The define and enum do not consume storage, they are constants that the compiler chooses how to implement, ultimately those bits if they are not optimized away, land somewhere in .text sometimes everywhere in .text, depends on the instruction set how its immediates work the specific constant, etc.
So define vs enum is basically do you want to pick all the values or do you want the compiler to pick some values for you, define if you want to control it enum if you want the compiler to choose the values.
So it really isnt a best practice thing at all it is a case of determining what your program needs to do and choosing the appropriate programming solution for that situation.
Depending on the compiler and the target processor, choosing volatile static const int vs not doing that can affect the rom consumption. But it is a very specific optimization, and not a general answer (and has nothing to do with embedded but with compiling in general).
Dan Saks explains why he prefers the enumeration constant in these articles, Symbolic Constants and Enumeration Constants vs Constant Objects. In summary, avoid macros because they don't observe the usual scope rules and the symbolic names are typically not preserved for symbolic debuggers. And prefer enumeration constants because they are not susceptible to a performance penalty that may affect constant objects. There is a lot more details in the linked articles.
Another thing to considerer is performance. A #define constant can usually be accessed faster than a const variable (for integers) since the const will need to be fetched from ROM (or RAM) and the #define value will usually be an immediate instruction argument so it is fetched along with the instruction (no extra cycles).
As for naming conflicts, I like to use prefixes like MOD_OPT_ where MOD is the module name OPT means that the define is a compile-time option, etc. Also only include the #defines in your header files if they're part of the public API, otherwise use an .inc file if they're needed in multiple source files or define them in the source file itself if they're only specific to that file.

Resources