char a_1[512];
int some_variable;
char a_2[512];
main()
{
...
}
Here in the above program, I have declared some variables, all in bss section of the code. Considering that I have kept in mind the alignment issues, can I be sure that the memory allocated for those 3 variables will always be contiguous?
Considering that I have kept in mind the alignment issues, can I be sure that the memory allocated for those 3 variables will always be contiguous?
Certainly not. Read the C11 standard n1570, and you won't find any guarantee about that.
Different compilers are likely to order variables differently, in particular when they are optimizing. Some variables might even stay in a register, and not even have any memory location. In practice some compilers are following the order of the source, others are using some different order.
And you practically could customize (perhaps with some pain) your GCC or your Clang compiler to change that order. And this does happen in practice. For example, recent versions of the GCC kernel might be configured with some GCC plugin which could reorder variables. With GCC or Clang you might also add some variable attribute to alter that order.
BTW, if you need some specific order, you could pack the fields in some struct e.g. code:
struct {
char a_1[512];
int some_variable;
char a_2[512];
} my_struct;
#define a_1 my_struct.a_1
#define some_variable my_struct.some_variable
#define a_2 my_struct.a_2
BTW, some old versions of GCC had an optional optimization pass which reordered (in some cases) fields in struct-s (but recent GCC removed that optimization pass).
In a comment (which should go into your question) you mention hunting some bug. Consider using the gdb debugger and its watchpoints (and/or valgrind). Don't forget to enable all warnings and debug info when compiling (so gcc -Wall -Wextra -g with GCC). Maybe you want also instrumentation options like -fsanitize=address etc...
Beware of undefined behavior.
Related
In our project, we are using ticlang compiler, i.e. a flavor of clang from TI.
Optimization is set to level -Os.
In the code we have variables that have a struct type and are only used within a C file and hence are defined as static struct_type_xy variable;
The compiler performs some optimization where the members of such a struct are not kept in sequence in one block of memory but are re-ordered and even split.
This means that while debugging such variables cannot be displayed properly.
Of course, I could define them as volatile but that would also prevent optimizing multiple accesses to same members which I don't want to happen.
Therefore I want to prevent this kind of optimization.
What is the name of such an optimization and how can I disable it in clang?
I don't have a MCVE yet but I can provide a few details:
typedef struct
{
Command_t Command; // this is an enum type
int Par_1; // System uses 32 bit integers.
int Par_2;
int Par_3;
int Par_4;
size_t Num_Tok;
} Cmd_t;
static Cmd_t Cmd;
The map file then contains:
20000540 00000004 Cmd.o (.bss.Cmd.1)
20000544 00000004 Cmd.o (.bss.Cmd.2)
20000548 00000004 Cmd.o (.bss.Cmd.5)
2000054c 00000004 HAL_*
...
2000057b 00000001 XY_*
2000057c 00000001 Cmd.o (.bss.Cmd.0)
The parts of Cmd are split accross the memory and some are even removed. (I used a bulid configuration where the missing 2 members are not used but the struct definition is identical for all configurations)
If I remove static this changes to
200004c4 00000018 (.common:Cmd)
Clang is apparently scalarizing the static struct, breaking it up into separate members, since the address is never taken or used, and doesn't escape the compilation unit. This lets it optimize away unused members.
LLVM has a "Scalar Replacement of Aggregates" (sroa) optimization pass.
https://llvm.org/docs/Passes.html#sroa-scalar-replacement-of-aggregates
(The alloca mentioned in that doc is an LLVM IR instruction, not the C alloca() function. Also, google found a random copy of the LLVM source that implements this while I was trying to find the right search terms.)
clang -O3 -Rpass=sroa might print a "remark" for each struct it optimizes, if that pass supports optimization reports.
According to Clang optimization levels, -sroa is enabled at -O1 and higher. But -sroa isn't a clang option, nor it an LLVM option for clang -mllvm -sroa. In 2011, someone asked about adding a command-line option to disable an arbitrary optimization pass; IDK if any feature ever got added.
clang -cc1 -mllvm -help-list-hidden does show some interesting option names, like --stop-before=<pass-name> and --start-after=<pass-name>, and there's a --sroa-strict-inbounds.
clang -mllvm --sroa-strict-inbounds -O1 does actually compile, but I don't know what it does.
clang -mllvm --stop-before=sroa -O3 hello.c doesn't work on my system with clang 13. Or with --stop-before=-sroa. I get error in backend: "sroa" pass is not registered.
So I don't know how to actually disable this optimization pass, but that's almost certainly the one responsible. This is as far as I've gotten.
It's enabled at -O1, so it's not viable to use a lower optimization level and enabling the other optimization flags that normally implies. -O0 is special, and marks everything as optnone, to make sure code-gen is suitably literal, storing/reloading everything between C statements.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Why would I want this?
I'd like to use a C package that was initially built in 2007 and last updated in 2016 according to the Changelog. I gather that it would have compiled cleanly back than.
Sadly, this is no longer the case.
The error
Running ./configure and make, I get a Multiply defined error:
gcc -g -O2 -o laplaafit laplaafit.o multimin.o common.o lib/libgnu.a -lgsl -lgslcblas -lm
/usr/bin/ld: common.o:/home/<user>/build/subbotools/subbotools-1.3.0/common.c:27: multiple definition of `Size'; laplaafit.o:/home/<user>/build/subbotools/subbotools-1.3.0/laplaafit.c:38: first defined here
/usr/bin/ld: common.o:/home/<user>/build/subbotools/subbotools-1.3.0/common.c:26: multiple definition of `Data'; laplaafit.o:/home/<user>/build/subbotools/subbotools-1.3.0/laplaafit.c:37: first defined here
Specifically, both files (laplaafit.c and common.c) have the declaration
double *Data; /*the array of data*/
unsigned Size;/*the number of data*/
with a definition of both variables following further down in the code in both files (I believe with load(&Data,&Size,infile); which calls function int load() in common.c which reads the array *Data from a file and determines its length Size).
This is what causes the error. The variables are important in both files (removal in either leads to '(variable)' undeclared errors). Moving to a header files would not change anything if the header (say common.h) is included in both .c files.
Edit: Since it was raised in the comments that load(&Data,&Size,infile); is "far from being a definition" I figure I should be a bit more detailed.
load(&Data,&Size,infile);
calls the int load(...) function from common.c
int load(double **data,unsigned *size,FILE *input)
Here, *Data is the array starting in address Data. &Data is the pointer to the pointer (double pointer?) to the start of the array. **data is a double pointer to a local array in load(). If the function obtains &Data for this, data actually refers to the original global array and the program gan write into it by accessing it via pointer *data.
And *size (for which the function obtains &Size) is the value in address &Size so the other global variable.
The function then writes into *data and *size multiple times, e.g., in the very end:
*size=i;
*data = (double *) my_realloc((void *) *data,(*size)*sizeof(double));
If I am not mistaken, this may count as the global variables *Data and Size being defined.
Furthermore, the comment says that I do not actually know enough C to diagnose the program and that I should therefore rather hire someone who does. This would raise the bar for being allowed to post in Stackoverflow to a very high level; a level that is not always attained in the questions that are commonly posted and seen as perfectly acceptable. It may actually be a reasonable suggestion, but it would leave no place for me to ask questions I might have about C or any other language. If the author of the comment is serious about this, it may be worth posting in Meta and suggesting splitting Stackoverflow in two, one for experts, one for everyone else.
How to solve the problem (making the code compile)
As I see it, there are two ways to approach this:
Rewrite the software package avoid multiple definitions. I would ideally like to avoid this.
Find a way to compile as it would have been compiled between 2007 and 2016. I assume it would have compiled cleanly back then. There are multiple potential problems with this: Would the old compiler still work with my 2021 system? Would that work with the libraries in a modern system? Even if I succeed, will the resulting executable behave as it would have been intended by the authors? Still, this seems the preferable option.
It is also still possible that I misinterpret the error or misunderstand something.
It is also possible that even between 2007 and 2016 this would not have compiled cleanly with my compiler (gcc) and that the authors used a different compiler that accepts multiple definitions.
Solution by compiling with old compiler behavior
Include the -fcommon option as discussed in kaylum's answer below.
Attempt to solve by changing the code
The intended behavior is obviously for the two variables Data and Size in the two files to refer to the same variable (the same point in memory). Therefore, declaring the variable as extern in laplaafit.c should recover the same behavior. Specifically, exchanging
double *Data; /*the array of data*/
unsigned Size;/*the number of data*/
for
extern double *Data; /*the array of data*/
extern unsigned Size;/*the number of data*/
The code compiles cleanly then again. I am not sure how certain I am that the bahavior is actually the same as intended by the authors (and achieved with old gcc versions and recent gcc with -fcommon) though.
Why I think this question is of general interest for programming (and this belongs on Stackoverflow)
I am guessing, however, that the question is more general. There are many old software packages around. Given enough time, most of them will break eventually.
Software
My system is Arch Linux kernel 5.11.2; C compiler: gcc 10.2.0; GNU Make 4.3.
Multiple definitions of global variables of the same type with the same name are permitted in gcc if the source is built with -fcommon. From the gcc manual:
The -fcommon places uninitialized global variables in a common block. This allows the linker to resolve all tentative definitions of the same variable in different compilation units to the same object, or to a non-tentative definition. This behavior is inconsistent with C++, and on many targets implies a speed and code size penalty on global variable references. It is mainly useful to enable legacy code to link without errors.
The default of pre-10 gcc used to be -fcommon but that has been changed to -fno-common in gcc 10. From the gcc 10 release notes:
GCC now defaults to -fno-common. As a result, global variable accesses are more efficient on various targets. In C, global variables with multiple tentative definitions now result in linker errors. With -fcommon such definitions are silently merged during linking.
This explains why the build fails in your environment using gcc 10 but was able to build with older gcc versions. Your options are to either add -fcommon into the build or use a gcc version prior to 10.
Or as pointed out by #JohnBollinger another option is to fix the code to remove those multiple definitions and make the code conform strictly to the C standard.
Many questions about forcing the order of functions in a binary to match the order of the source file
For example, this post, that post and others
I can't understand why would gcc want to change their order in the first place?
What could be gained from that?
Moreover, why is toplevel-reorder default value is true?
GCC can change the order of functions, because the C standard (e.g. n1570 or newer) allows to do that.
There is no obligation for GCC to compile a C function into a single function in the sense of the ELF format. See elf(5) on Linux
In practice (with optimizations enabled: try compiling foo.c with gcc -Wall -fverbose-asm -O3 foo.c then look into the emitted foo.s assembler file), the GCC compiler is building intermediate representations like GIMPLE. A big lot of optimizations are transforming GIMPLE to better GIMPLE.
Once the GIMPLE representation is "good enough", the compiler is transforming it to RTL
On Linux systems, you could use dladdr(3) to find the nearest ELF function to a given address. You can also use backtrace(3) to inspect your call stack at runtime.
GCC can even remove functions entirely, in particular static functions whose calls would be inline expanded (even without any inline keyword).
I tend to believe that if you compile and link your entire program with gcc -O3 -flto -fwhole-program some non static but unused functions can be removed too....
And you can always write your own GCC plugin to change the order of functions.
If you want to guess how GCC works: download and study its source code (since it is free software) and compile it on your machine, invoke it with GCC developer options, ask questions on GCC mailing lists...
See also the bismon static source code analyzer (some work in progress which could interest you), and the DECODER project. You can contact me by email about both. You could also contribute to RefPerSys and use it to generate GCC plugins (in C++ form).
What could be gained from that?
Optimization. If the compiler thinks some code is like to be used a lot it may put that code in a different region than code which is not expected to execute often (or is an error path, where performance is not as important). And code which is likely to execute after or temporally near some other code should be placed nearby, so it is more likely to be in cache when needed.
__attribute__((hot)) and __attribute__((cold)) exist for some of the same reasons.
why is toplevel-reorder default value is true?
Because 99% of developers are not bothered by this default, and it makes programs faster. The 1% of developers who need to care about ordering use the attributes, profile-guided optimization or other features which are likely to conflict with no-toplevel-reorder anyway.
I'm aware that in C you may write beyond the end of allocated memory, and that instead of crashing this just leads to undefined behaviour, but somehow after testing many times, even with loops, and other variables, the output is always exactly as expected.
Specifically, I've been writing to an integer beyond the bounds of malloc(1), as such.
int *x = malloc(1);
*x = 123456789;
It's small enough to fit in 4 bytes (my compiler warns me that it will overflow it it's too large, which makes sense), but still clearly larger than one byte, however it still somehow works. I haven't been able to run a single test that didn't either work in a very "defined"-looking manner, or segfault immediately. Such tests include repeatedly recompiling and running the program, and outputting the value of x, trying to write over it with a giant array, and trying to write over it with an array of length 0, going beyond its boundaries.
After seeing this, I immediately went and tried to edit a string literal, which should be read-only. But somehow, it worked, and seemed consistent also.
Can someone recommend a test I may use to demonstrate undefined behaviour? Is my compiler (Mingw64 on Windows 10) somehow doing something to make up for my perceived stupidity? Where are the nasal demons?
The term "Undefined Behavior" embodies two different concepts: actions whose behavior isn't specified by anything, and actions whose behavior isn't specified by the C Standard, but is specified by many implementations. While some people, including the maintainers of some compilers, refuse to acknowledge the existence of the second category, the authors of the Standard described it explicitly:
Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior.
On most implementations, your program would be example of the first kind. Implementations will typically, for their own convenience, pad small allocation requests up to a certain minimum size, and will also pad larger allocation requests if needed to make them be a multiple of a certain size. They generally do not document this behavior, however. Your code should only be expected to behave meaningfully on an implementation which documents the behavior of malloc in sufficient detail to guarantee that the requisite amount of space will be available; on such an implementation, your code would invoke UB of the second type.
Many kinds of tasks would be impossible or impractical without exploiting the second kind of UB, but such exploitation generally requires disabling certain compiler optimizations and diagnostic features. I can't think of any reason why code that wanted space for 4 bytes would only malloc one, unless it was designed to test the behavior of an allocator which would use the storage immediately past the end of an allocation for a particular documented purpose.
One of the trademarks of undefined behavior is that the same code can behave differently on different compilers or with different compiler settings.
Given this code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *x = malloc(1);
x[100000] = 123456789;
return 0;
}
If I compile this on my local machine with -O0 and run it, the code segfaults. If I compile with -O3, it doesn't.
[dbush#centos72 ~]$ gcc -O0 -Wall -Wextra -o x1 x1.c
[dbush#centos72 ~]$ ./x1
Segmentation fault (core dumped)
[dbush#centos72 ~]$ gcc -O3 -Wall -Wextra -o x1 x1.c
[dbush#centos72 ~]$ ./x1
[dbush#centos72 ~]$
Of course, this is just on my machine. Yours may do something entirely different.
Is it OK to say that 'volatile' keyword makes no difference if the compiler optimization is turned off i.e (gcc -o0 ....)?
I had made some sample 'C' program and seeing the difference between volatile and non-volatile in the generated assembly code only when the compiler optimization is turned on i.e ((gcc -o1 ....).
No, there is no basis for making such a statement.
volatile has specific semantics that are spelled out in the standard. You are asserting that gcc -O0 always generates code such that every variable -- volatile or not -- conforms to those semantics. This is not guaranteed; even if it happens to be the case for a particular program and a particular version of gcc, it could well change when, for example, you upgrade your compiler.
Probably volatile does not make much difference with gcc -O0 -for GCC 4.7 or earlier. However, this is probably changing in the next version of GCC (i.e. future 4.8, that is current trunk). And the next version will also provide -Og to get debug-friendly optimization.
In GCC 4.7 and earlier no optimizations mean that values are not always kept in registers from one C (or even Gimple, that is the internal representation inside GCC) instruction to the next.
Also, volatile has a specific meaning, both for standard conforming compilers and for human. For instance, I would be upset if reading some code with a sig_atomic_t variable which is not volatile!
BTW, you could use the -fdump-tree-all option to GCC to get a lot of dump files, or use the MELT domain specific language and plugin, notably its probe to query the GCC internal representations thru a graphical interface.