How to specify default global variable alignment for gcc? - c

How do I get rid of alignment (.align 4 below) for all global variables by default with GCC, without having to specify __attribute__((aligned(1))) for each variable?
I know that what I ask for is a bad idea to apply universally, becuase on some architectures an alignment of 1 wouldn't work, because e.g. the CPU is not able to dereference an unaligned pointer. Bit in my case I'm writing an i386 bootloader, and unaligned pointers are fine (but slower) there.
Source code (a.c):
__attribute__((aligned(1))) int answer0 = 41;
int answer = 42;
Compiled with: gcc -m32 -Os -S a.c
Assembly output (a.s):
.file "a.c"
.globl answer
.data
.align 4
.type answer, #object
.size answer, 4
answer:
.long 42
.globl answer0
.type answer0, #object
.size answer0, 4
answer0:
.long 41
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4"
.section .note.GNU-stack,"",#progbits
The flag gcc -fpack-struct=1 changes the alignment of all struct members and structs to 1. For example, with that flag
struct x { char a; int b; };
struct y { int v : sizeof(char) + sizeof(int) == sizeof(struct x); };
struct z { int b; };
struct x x = { 1, 1 };
int i = 42;
struct z z = { 2 };
compiles to no alignment for variables x' andz', but it still has an .align 4 for the variable i (of type int). I need a solution which also makes int i = 42; unaligned, without having to specify something extra for each such variable.

IMO packing variables to save the space using the packed struct is the easiest and safest way.
example:
#include <stdio.h>
#include <stdint.h>
#define _packed __attribute__((packed))
_packed struct
{
uint8_t x1;
_packed int x2;
_packed uint8_t x3[2];
_packed int x4;
}byte_int;
int main(void) {
printf("%p %p %p %p\n", &byte_int.x1, &byte_int.x2, &byte_int.x3, &byte_int.x4);
printf("%u %u %u %u\n", (unsigned int)&byte_int.x1, (unsigned int)&byte_int.x2, (unsigned int)&byte_int.x3, (unsigned int)&byte_int.x4); // I know it is an UB just to show the op in dec - easier to spot the odd and the even addresses
return 0;
}
https://ideone.com/bY1soH

Most probably gcc doesn't have such a flag which can change the default alignment of global variables.
gcc -fpack-struct=1 can be a workaround, but only for global variables which happen to be of struct type.
Also post-processing the .s output of gcc and removing (some of) the .align lines could work as a workaround.

Related

gcc optimized out unused variable when it should not

Considering the following code which many comes mostly from Bluedroid stack
#include <stdint.h>
#include <assert.h>
#define STREAM_TO_UINT16(u16, p) {u16 = ((uint16_t)(*(p)) + (((uint16_t)(*((p) + 1))) << 8)); (p) += 9;}
void func(uint8_t *param) {
uint8_t *stream = param;
uint16_t handle, handle2;
*stream = 5;
STREAM_TO_UINT16(handle, stream);
STREAM_TO_UINT16(handle2, stream);
assert(handle);
assert(handle2);
*stream = 7;
}
.file "opt.c"
.text
.align 4
.global func
.type func, #function
func:
entry sp, 32
movi.n a8, 5
s8i a8, a2, 0
movi.n a8, 7
s8i a8, a2, 18
retw.n
.size func, .-func
.ident "GCC: (crosstool-NG esp-2020r3) 8.4.0"
When it is compiled with NDEBUG, then assert() resolved to nothing and "handle" is optimized out with -O2 or's' or '3' . As a result, the macro is not expanded and the pointer is not incremented.
I know that I can make "handle" volatile as one option to solve the issue and I agree adding variable modification in macros is dangerous, but this is not my code, this is Bluedroid.
Well first, is this borderline a gcc bug and then is there a way to tell gcc to not optimize out unused variable?
Oops ... no I just re-read the ISA of the eXtensa and I was wrong, the value of a8 is stored where a2 points, with offset, so this is correct. I need to look somewhere else b/c the core of the problem is that as soon as I set NDEBUG, my bluedroid stacks (this is on esp32) stops working, so I was searching for differences and looking where the compiler was whining (unused variables). Thanks for taking the time to answer.

Definition floating-point numbers in X86 Assembly - C Translation

Currently studying C. When I define, for example, a vector, such as:
float var1[2023] = {-53.3125}
What would the corresponding X86 Assembly translation look like? I'm looking for the exact portion of code where the variable is defined, where the ".type" and ".size" and alignment values are mentioned.
I've seen on the internet that when dealing with a floating-point number, the X86 Assembly conversion will simply be ".long". However, I'm not sure to what point that is correct.
One easy way to find out is to ask the compiler to show you:
// float.c
float var1[2023] = { -53.3125 };
then compile it:
$ gcc -S float.c
and then study the output:
.file "float.c"
.globl var1
.data
.align 32
.type var1, #object
.size var1, 8092
var1:
.long 3260366848
.zero 8088
.ident "GCC: (GNU) 4.8.5 20150623 (Red Hat 4.8.5-39)"
.section .note.GNU-stack,"",#progbits
Note that this is just GCC's implementation; clang does it differently:
.file "float.c"
.type var1,#object # #var1
.data
.globl var1
.align 16
var1:
.long 3260366848 # float -5.331250e+01
.long 0 # float 0.000000e+00
.long 0 # float 0.000000e+00
// thousands of these
.size var1, 8092
.ident "clang version 3.4.2 (tags/RELEASE_34/dot2-final)"
.section ".note.GNU-stack","",#progbits
EDIT - To answer the comment below, the use of long simply lays down a specific bit pattern that encodes the compiler's idea of floating point format.
The value 3260366848 is the same as hex 0xC2554000, which is 11000010010101010100000000000000 in binary, and it's the binary value that the CPU cares about. If you care to, you can get out your IEEE floating point spec and decode this, there's the sign, that's the exponent, etc. but all the details of the floating point encoding were handled by the compiler, not the assembler.
I'm no kind of compiler expert, but decades ago I was tracking down a bug in a C compiler's floating point support, and though I don't remember the details, in the back of my mind it strike me as having the compiler do this would have been helpful by saving me from having to use a disassembler to find out what the bit pattern was actually encoded.
Surely others will weigh in here.
EDIT2 Bits are bits, and this little C program (which relies on sizeof int and sizeof float being the same size), demonstrates this:
// float2.c
#include <stdio.h>
#include <memory.h>
int main()
{
float f = -53.3125;
unsigned int i;
printf("sizeof int = %lu\n", sizeof(i));
printf("sizeof flt = %lu\n", sizeof(f));
memcpy(&i, &f, sizeof i); // copy float bits into an int
printf("float = %f\n", f);
printf("i = 0x%08x\n", i);
printf("i = %u\n", i);
return 0;
}
Running it shows that bits are bits:
sizeof int = 4
sizeof flt = 4
float = -53.312500
i = 0xc2554000
i = 3260366848 <-- there ya go
This is just a display notion for 32 bits depending on how you look at them.
Now to answer the question of how would you determine 3260366848 on your own from the floating point value, you'd need to get out your IEEE standard and draw out all the bits manually (recommend strong coffee), then read those 32 bits as an integer.

Holding variable in struct

I have a HIGHLY performance critical section in my code where i need to minimize cpu load as much as i can. If i have a struct that has one instance, is there ANY difference in performance between defining variables in code by iteself like this:
int something;
int randomVariable;
or defining them in struct?
struct Test
{
int something;
int randomVariable;
}
because i want to use struct to make code look better
In my opinion the best way to know this , first write two different programs in C , with struct and without struct and then make them assembly file with using
gcc -S file.c
Since i dont know your code , i directly assigned values to them :
int main() {
int something;
int randomVariable;
something = 3;
randomVariable = 3;
return 0;}
and
main() {
struct Test
{
int something;
int randomVariable;
}test;
test.something = 3;
test.randomVariable = 3;
return 0;}
and i get assembly files on my Ubuntu-64bit , intel i5 machine
I saw that assembly files are nearly same
.file "test1.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $3, -8(%rbp) **Second one(with struct) has value -16 instead -8**
movl $3, -4(%rbp) **Second one has value -12 instead of -4**
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4"
.section .note.GNU-stack,"",#progbits
So according to that results I can say that two implementation has not any significant difference about CPU load. Only difference between them second one is using very very little more memory than first one.
First, to be fair, because i want to use struct to make code look better is purely a style thing. What looks better to one person may not look better to another.
I am a fan of struct when there is a choice, for several reasons.
Speed/size efficiency:
Compare a struct over over two discrete int variables when data needed to be passed as a function argument.
Using:
int a;
int b;
Or
typedef struct {
int a;
int b;
}VAR;
VAR var;
The same data could be passed as separate pointers via function arguments (assuming 32 bit addressing):
int func1(int *a, int *b);//two 32 bit addresses passed
Or:
int func2(VAR *v);//one 32 bit address passed
The efficiency (of this type) goes up directly as number of variable goes up.
(efficiency gain if there were 100 ints?)
In the first example, you are passing two int *, while in the second, only one. Its a small difference, but it is a difference. The magnitude of the advantage is also dependent on addressing used. 32bit or 64bit.
Code maintenance and readability:
function prototypes, when used as application programming interface (API) should be stable. Using struct as arguments or as a return type support this interface stability.
For example: Given a requirement to calculate the changing velocity in Cartesian coordinates of x, y & z, of an object moving in a straight line with respect to time, you might design a function that would be called repeatedly with current values of velocityxyz and accelerationxyz and timems. The number of arguments required clearly suggest use of a struct. Struct is also suggested as the return type:
typedef struct {
double x;
double y;
double z;
}VEL; //velocity
typedef struct {
double x;
double y;
double z;
}ACC; //acceleration
typedef struct {
VEL vel;
ACC acc;
time_t ms;
}KIN; //kinematics
KIN * GetVelocity(KIN *newV);
If a new requirement for knowing Positionxyz was added to the project, all that would have to be added is a new member of the KIN struct:
typedef struct {
double x;
double y;
double z;
}POS; //position
...
typedef struct {
POS pos;
VEL vel;
ACC acc;
time_t ms;
}KIN; //kinematics
KIN * GetVelocity(KIN *newV);//prototype will continue to work
//without changing interface (argument list)
KIN * GetPosition(KIN *newV);//new prototype for position also supported

c variables allocation on memory, pointers

How are variables are located in memory? I have this code
int w=1;
int x=1;
int y=1;
int z=1;
int main(int argc, char** argv) {
printf("\n w %d",&w);
printf("\n x %d",&x);
printf("\n y %d",&y);
printf("\n z %d",&z);
return (EXIT_SUCCESS);
}
and it prints this
w 134520852
x 134520856
y 134520860
z 134520864
We can see that when other integer is declare and assigned, the address is moved four positions (bytes I suppose, it seems very logic). But if we don't assign the variables, like in the next code:
int w;
int x;
int y;
int z;
int main(int argc, char** argv) {
printf("\n w %d",&w);
printf("\n x %d",&x);
printf("\n y %d",&y);
printf("\n z %d",&z);
return (EXIT_SUCCESS);
}
it prints this
w 134520868
x 134520864
y 134520872
z 134520860
We can see there are four positions between addresses, but they are not in order. Why is this? How works the compiler in that case?
In case you want to know why I'm asking this, it is because I'm starting to study some security and I'm trying to understand some attacks, for instance, how integer overflow attacks work, and I've been playing with pointers in C to modify other variables by adding more positions than the size of the variable and things like that.
Your first example initialises variables, which generates different allocation code. By looking into the assembly file generated by gcc (gas) I get:
.globl _w
.data
.align 4
_w:
.long 1
.globl _x
.align 4
_x:
.long 1
.globl _y
.align 4
_y:
.long 1
.globl _z
.align 4
_z:
.long 1
And this basically dictates the memory addresses.
Your second example creates uninitialised variables, and as Jonathan said, those go into BSS. The assembler code is:
.comm _w, 4, 2
.comm _x, 4, 2
.comm _y, 4, 2
.comm _z, 4, 2
And that means you can't guarantee the sequence in which those variables will be located in memory.
The second set of numbers is also consecutive, just not ordered the same as in the source. I think the reason for this is simply that when you initialize the variables the compiler puts them in order because it maintains the order of initializations, in the second case you just get a random order.
In any case this depends on the compiler; I get the same pattern (ordered, 4 bytes apart) in both cases.

Does struct with a single member have the same performance as a member type?

Does struct with a single member have the same performance as a member type (memory usage and speed)?
Example:
This code is a struct with a single member:
struct my_int
{
int value;
};
is the performance of my_int same as int ?
Agree with #harper overall, but watch out for the following:
A classic difference is seen with a "unstructured" array and an array in a structure.
char s1[1000];
// vs
typedef struct {
char s2[1000];
} s_T;
s_T s3;
When calling functions ...
void f1(char s[1000]);
void f2(s_T s);
void f3(s_T *s);
// Significant performance difference is not expected.
// In both, only an address is passed.
f1(s1);
f1(s3.s2);
// Significant performance difference is expected.
// In the second case, a copy of the entire structure is passed.
// This style of parameter passing is usually frowned upon.
f1(s1);
f2(s3);
// Significant performance difference is not expected.
// In both, only an address is passed.
f1(s1);
f3(&s3);
In some cases, the ABI may have specific rules for returning structures and passing them to functions. For example, given
struct S { int m; };
struct S f(int a, struct S b);
int g(int a, S b);
calling f or g may, for example, pass a in a register, and pass b on the stack. Similarly, calling g may use a register for the return value, whereas calling f may require the caller to set up a location where f will store its result.
The performance differences of this should normally be negligible, but one case where it could make a significant difference is when this difference enables or disables tail recursion.
Suppose g is implemented as int g(int a, struct S b) { return g(a, b).m; }. Now, on an implementation where f's result is returned the same way as g's, this may compile to (actual output from clang)
.file "test.c"
.text
.globl g
.align 16, 0x90
.type g,#function
g: # #g
.cfi_startproc
# BB#0:
jmp f # TAILCALL
.Ltmp0:
.size g, .Ltmp0-g
.cfi_endproc
.section ".note.GNU-stack","",#progbits
However, on other implementations, such a tail call is not possible, so if you want to achieve the same results for a deeply recursive function, you really need to give f and g the same return type or you risk a stack overflow. (I'm aware that tail calls are not mandated.)
This doesn't mean int is faster than S, nor does it mean that S is faster than int, though. The memory use would be similar regardless of whether int or S is used, so long as the same one is consistently used.
If the compiler has any penalty on using structs instead of single variables is strictly compiler and compiler options dependent.
But there are no reasons why the compiler should make any differences when your struct contains only one member. There should be additional code necessary to access the member nor to derefence any pointer to such an struct. If you don't have this oversimplified structure with one member deferencing might cost one addtional CPU instruction depending on the used CPU.
A minimal example w/ GCC 10.2.0 -O3 gives exactly the same output i.e. no overhead introduced by struct:
diff -u0 <(
gcc -S -o /dev/stdout -x c -std=gnu17 -O3 -Wall -Wextra - <<EOF
// Out of the box
void OOTB(int *n){
*n+=999;
}
EOF
) <(
gcc -S -o /dev/stdout -x c -std=gnu17 -O3 -Wall -Wextra - <<EOF
// One member struct
typedef struct { int inq_n; } inq;
void OMST(inq *n){
n->inq_n+=999;
}
EOF
)
--- /dev/fd/63 [...]
+++ /dev/fd/62 [...]
## -4,3 +4,3 ##
- .globl OOTB
- .type OOTB, #function
-OOTB:
+ .globl OMST
+ .type OMST, #function
+OMST:
## -13 +13 ##
- .size OOTB, .-OOTB
+ .size OMST, .-OMST
Not sure about more realistic/complex situations.

Resources