I have the following C code on Solaris 5.10 64-bits compiled with CC 5.10 with flags -m64 -KPIC -x04
header.h
typedef struct functions
{
double (* pfComputeGeneric) (myStruct *, myStruct *, double, double *, int);
} functions;
...
double myCompute(myStruct *, myStruct *, double, double *, int);
source.c
double myCompute(myStruct * px1, myStruct *px2, double d1, double *pd1, int i1)
{
// Do stuff with px1
}
...
myStruct *pxStruct = alloc(...);
functions *pxFunctions = alloc(...);
pxFunctions->pfComputeGeneric = myCompute;
...
double dResult += pxFunctions->pfComputeGeneric(pxStruct, pxStruct, 0.0, NULL, 0);
The code in source.c runs fine (nothing weird) until I enter enter into myCompute through function pointer pfCompute, where px1 gets corrupted. I don't know why.
Replacing the call through pfCompute by a direct call to myCompute solves the issue.
Removing the -x04 option also solves the issue.
I had a look at the answer of this question but I'm sure I'm not messing with pointer sizes.
I think it is indeed an issue of -x04. When I look at the assemby call, I see:
...
0x0000000000987eb2: myCaller+0x081a: movq 0xfffffffffffffe28(%rbp),%rcx
0x0000000000987eb9: myCaller+0x0821: movq $0x0000000000000006,%rax
0x0000000000987ec0: myCaller+0x0828: movq 0xfffffffffffffe08(%rbp),%rdi
0x0000000000987ec7: myCaller+0x082f: call *0x0000000000000018(%rdi)
0x0000000000987eca: myCaller+0x0832: addq $0x0000000000000010,%rsp
So the compiler uses %rdi (!) to get the real adress of myCompute from pxFunctions. And in 64-bits, %rdi is used to store the first argument of a function, hence the alteration.
Related
The Effective Type rule in C99 and C11 provides that storage with no declared type may be written with any type and, that storing a value of a non-character type will set the Effective Type of the storage accordingly.
Setting aside the fact that INT_MAX might be less than 123456789, would the following code's use of the Effective Type rule be strictly conforming?
#include <stdlib.h>
#include <stdio.h>
/* Performs some calculations using using int, then float,
then int.
If both results are desired, do_test(intbuff, floatbuff, 1);
For int only, do_test(intbuff, intbuff, 1);
For float only, do_test(floatbuff, float_buff, 0);
The latter two usages require storage with no declared type.
*/
void do_test(void *p1, void *p2, int leave_as_int)
{
*(int*)p1 = 1234000000;
float f = *(int*)p1;
*(float*)p2 = f*2-1234000000.0f;
if (leave_as_int)
{
int i = *(float*)p2;
*(int*)p1 = i+567890;
}
}
void (*volatile test)(void *p1, void *p2, int leave_as_int) = do_test;
int main(void)
{
int iresult;
float fresult;
void *p = malloc(sizeof(int) + sizeof(float));
if (p)
{
test(p,p,1);
iresult = *(int*)p;
test(p,p,0);
fresult = *(float*)p;
free(p);
printf("%10d %15.2f\n", iresult,fresult);
}
return 0;
}
From my reading of the Standard, all three usages of the function described in the comment should be strictly conforming (except for the integer-range issue). The code should thus output 1234567890 1234000000.00. GCC 7.2, however, outputs 1234056789 1157904.00. I think that when leave_as_int is 0, it's storing 123400000 to *p1 after it stores 123400000.0f to *p2, but I see nothing in the Standard that would authorize such behavior. Am I missing anything, or is gcc non-conforming?
Yes, this is a gcc bug. I've filed it (with a simplified testcase) as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82697 .
The generated machine code unconditionally writes to both pointers:
do_test:
cmpl $1, %edx
movl $0x4e931ab1, (%rsi)
sbbl %eax, %eax
andl $-567890, %eax
addl $1234567890, %eax
movl %eax, (%rdi)
ret
This is a GCC bug because all stores are expected to change the dynamic type of the memory accessed. I do not think this behavior is mandated by the standard; it is a GCC extension.
Can you file a GCC bug?
I have a HIGHLY performance critical section in my code where i need to minimize cpu load as much as i can. If i have a struct that has one instance, is there ANY difference in performance between defining variables in code by iteself like this:
int something;
int randomVariable;
or defining them in struct?
struct Test
{
int something;
int randomVariable;
}
because i want to use struct to make code look better
In my opinion the best way to know this , first write two different programs in C , with struct and without struct and then make them assembly file with using
gcc -S file.c
Since i dont know your code , i directly assigned values to them :
int main() {
int something;
int randomVariable;
something = 3;
randomVariable = 3;
return 0;}
and
main() {
struct Test
{
int something;
int randomVariable;
}test;
test.something = 3;
test.randomVariable = 3;
return 0;}
and i get assembly files on my Ubuntu-64bit , intel i5 machine
I saw that assembly files are nearly same
.file "test1.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $3, -8(%rbp) **Second one(with struct) has value -16 instead -8**
movl $3, -4(%rbp) **Second one has value -12 instead of -4**
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4"
.section .note.GNU-stack,"",#progbits
So according to that results I can say that two implementation has not any significant difference about CPU load. Only difference between them second one is using very very little more memory than first one.
First, to be fair, because i want to use struct to make code look better is purely a style thing. What looks better to one person may not look better to another.
I am a fan of struct when there is a choice, for several reasons.
Speed/size efficiency:
Compare a struct over over two discrete int variables when data needed to be passed as a function argument.
Using:
int a;
int b;
Or
typedef struct {
int a;
int b;
}VAR;
VAR var;
The same data could be passed as separate pointers via function arguments (assuming 32 bit addressing):
int func1(int *a, int *b);//two 32 bit addresses passed
Or:
int func2(VAR *v);//one 32 bit address passed
The efficiency (of this type) goes up directly as number of variable goes up.
(efficiency gain if there were 100 ints?)
In the first example, you are passing two int *, while in the second, only one. Its a small difference, but it is a difference. The magnitude of the advantage is also dependent on addressing used. 32bit or 64bit.
Code maintenance and readability:
function prototypes, when used as application programming interface (API) should be stable. Using struct as arguments or as a return type support this interface stability.
For example: Given a requirement to calculate the changing velocity in Cartesian coordinates of x, y & z, of an object moving in a straight line with respect to time, you might design a function that would be called repeatedly with current values of velocityxyz and accelerationxyz and timems. The number of arguments required clearly suggest use of a struct. Struct is also suggested as the return type:
typedef struct {
double x;
double y;
double z;
}VEL; //velocity
typedef struct {
double x;
double y;
double z;
}ACC; //acceleration
typedef struct {
VEL vel;
ACC acc;
time_t ms;
}KIN; //kinematics
KIN * GetVelocity(KIN *newV);
If a new requirement for knowing Positionxyz was added to the project, all that would have to be added is a new member of the KIN struct:
typedef struct {
double x;
double y;
double z;
}POS; //position
...
typedef struct {
POS pos;
VEL vel;
ACC acc;
time_t ms;
}KIN; //kinematics
KIN * GetVelocity(KIN *newV);//prototype will continue to work
//without changing interface (argument list)
KIN * GetPosition(KIN *newV);//new prototype for position also supported
When I attempt to run this code as it is, I receive the compiler message "error: incompatible types in return". I marked the location of the error in my code. If I take the line out, then the compiler is happy.
The problem is I want to return a value representing invalid input to the function (which in this case is calling f2(2).) I only want a struct returned with data if the function is called without using 2 as a parameter.
I feel the only two ways to go is to either:
make the function return a struct pointer instead of a dead-on struct but then my caller function will look funny as I have to change y.b to y->b and the operation may be slower due to the extra step of fetching data in memory.
Allocate extra memory, zero-byte fill it, and set the return value to the struct in that location in memory. (example: return x[nnn]; instead of return x[0];). This approach will use more memory and some processing to zero-byte fill it.
Ultimately, I'm looking for a solution that will be fastest and cleanest (in terms of code) in the long run. If I have to be stuck with using -> to address members of elements then I guess that's the way to go.
Does anyone have a solution that uses the least cpu power?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct{
long a;
char b;
}ab;
static char dat[500];
ab f2(int set){
ab* x=(ab*)dat;
if (set==2){return NULL;}
if (set==1){
x->a=1;
x->b='1';
x++;
x->a=2;
x->b='2';
x=(ab*)dat;
}
return x[0];
}
int main(){
ab y;
y=f2(1);
printf("%c",y.b);
y.b='D';
y=f2(0);
printf("%c",y.b);
return 0;
}
If you care about speed, it is implementation specific.
Notice that the Linux x86-64 ABI defines that a struct of two (exactly) scalar members (that is, integers, doubles, or pointers, -which all fit in a single machine register- but not struct etc... which are aggregate data) is returned thru two registers (without going thru the stack), and that is quite fast.
BTW
if (set==2){return NULL;} //wrong
is obviously wrong. You could code:
if (set==2) return (aa){0,0};
Also,
ab* x=(ab*)dat; // suspicious
looks suspicious to me (since you return x[0]; later). You are not guaranteed that dat is suitably aligned (e.g. to 8 or 16 bytes), and on some platforms (notably x86-64) if dat is misaligned you are at least losing performance (actually, it is undefined behavior).
BTW, I would suggest to always return with instructions like return (aa){l,c}; (where l is an expression convertible to long and c is an expression convertible to char); this is probably the easiest to read, and will be optimized to load the two return registers.
Of course if you care about performance, for benchmarking purposes, you should enable optimizations (and warnings), e.g. compile with gcc -Wall -Wextra -O2 -march=native if using GCC; on my system (Linux/x86-64 with GCC 5.2) the small function
ab give_both(long x, char y)
{ return (ab){x,y}; }
is compiled (with gcc -O2 -march=native -fverbose-asm -S) into:
.globl give_both
.type give_both, #function
give_both:
.LFB0:
.file 1 "ab.c"
.loc 1 7 0
.cfi_startproc
.LVL0:
.loc 1 7 0
xorl %edx, %edx # D.2139
movq %rdi, %rax # x, x
movb %sil, %dl # y, D.2139
ret
.cfi_endproc
you see that all the code is using registers, and no memory is used at all..
I would use the return value as an error code, and the caller passes in a pointer to his struct such as:
int f2(int set, ab *outAb); // returns -1 if invalid input and 0 otherwise
I'm finding I'm spending a lot of time trying to determine member offsets of structures while debugging. I was wondering if there was a quick way to determine the offset of a member within a large structure at compile time. (Note: I know several ways of creating compile time assertions that an offset is in a given range, and I can do a binary search for the correct value, but I'm looking for something more efficient). I'm using a fairly recent version of gcc to compile C code.
offsetof is what you want for this and it compile time. The C99 draft standard section 7.17 Common definitions paragraph 3 says:
offsetof(type, member-designator)
which expands to an integer constant expression that has type size_t, the value of
which is the offset in bytes [...]
the man pages linked above has the following sample code, which demonstrates it's usage:
struct s {
int i;
char c;
double d;
char a[];
};
/* Output is compiler dependent */
printf("offsets: i=%ld; c=%ld; d=%ld a=%ld\n",
(long) offsetof(struct s, i),
(long) offsetof(struct s, c),
(long) offsetof(struct s, d),
(long) offsetof(struct s, a));
printf("sizeof(struct s)=%ld\n", (long) sizeof(struct s));
sample output, which may vary:
offsets: i=0; c=4; d=8 a=16
sizeof(struct s)=16
For reference constant expressions are covered in section 6.6 Constant expressions and paragraph 2 says:
A constant expression can be evaluated during translation rather than runtime, and
accordingly may be used in any place that a constant may be.
Update
After clarification from the OP I came up with a new methods using -O0 -fverbose-asm -S and grep, code is as follows:
#include <stddef.h>
struct s {
int i;
char c;
double d;
char a[];
};
int main()
{
int offsetArray[4] = { offsetof(struct s, i ), offsetof( struct s, c ), offsetof(struct s, d ), offsetof(struct s, a ) } ;
return 0 ;
}
build using:
gcc -x c -std=c99 -O0 -fverbose-asm -S main.cpp && cat main.s | grep offsetArray
sample output (live example):
movl $0, -16(%rbp) #, offsetArray
movl $4, -12(%rbp) #, offsetArray
movl $8, -8(%rbp) #, offsetArray
movl $16, -4(%rbp) #, offsetArray
Ok, answering my own question here: Note: I'm looking to determine the offset at compile time, that is, I don't want to have to run the code (I can also compile just the files I need, and not the whole system): The following can be cut and paste for those who are interested:
#include <stddef.h>
#define offsetof_ct(structname, membername) \
void ___offset_##membername ## ___(void) { \
volatile char dummy[10000 + offsetof(structname, membername) ]; \
dummy[0]=dummy[0]; \
}
struct x {
int a[100];
int b[20];
int c[30];
};
offsetof_ct(struct x,a);
offsetof_ct(struct x,b);
offsetof_ct(struct x,c);
And then run:
~/tmp> gcc tst.c -Wframe-larger-than=1000
tst.c: In function ‘___offset_a___’:
tst.c:16:1: warning: the frame size of 10000 bytes is larger than 1000 bytes
tst.c: In function ‘___offset_b___’:
tst.c:17:1: warning: the frame size of 10400 bytes is larger than 1000 bytes
tst.c: In function ‘___offset_c___’:
tst.c:18:1: warning: the frame size of 10480 bytes is larger than 1000 bytes
/usr/lib/gcc/x86_64-redhat-linux/4.5.1/../../../../lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: ld returned 1 exit status
I then subtract 10000 to get the offset. Note: I tried adding #pragma GCC diagnostic warning "-Wframe-larger-than=1000" into the file, but it didn't like it, so it'll have to be specified on the command line.
John
This should do:
#define OFFSETOF(T, m) \
(size_t) (((char *) &(((T*) NULL)->m)) - ((char *) ((T*) NULL)))
Call it like this:
size_t off = OFFSETOF(struct s, c);
Does struct with a single member have the same performance as a member type (memory usage and speed)?
Example:
This code is a struct with a single member:
struct my_int
{
int value;
};
is the performance of my_int same as int ?
Agree with #harper overall, but watch out for the following:
A classic difference is seen with a "unstructured" array and an array in a structure.
char s1[1000];
// vs
typedef struct {
char s2[1000];
} s_T;
s_T s3;
When calling functions ...
void f1(char s[1000]);
void f2(s_T s);
void f3(s_T *s);
// Significant performance difference is not expected.
// In both, only an address is passed.
f1(s1);
f1(s3.s2);
// Significant performance difference is expected.
// In the second case, a copy of the entire structure is passed.
// This style of parameter passing is usually frowned upon.
f1(s1);
f2(s3);
// Significant performance difference is not expected.
// In both, only an address is passed.
f1(s1);
f3(&s3);
In some cases, the ABI may have specific rules for returning structures and passing them to functions. For example, given
struct S { int m; };
struct S f(int a, struct S b);
int g(int a, S b);
calling f or g may, for example, pass a in a register, and pass b on the stack. Similarly, calling g may use a register for the return value, whereas calling f may require the caller to set up a location where f will store its result.
The performance differences of this should normally be negligible, but one case where it could make a significant difference is when this difference enables or disables tail recursion.
Suppose g is implemented as int g(int a, struct S b) { return g(a, b).m; }. Now, on an implementation where f's result is returned the same way as g's, this may compile to (actual output from clang)
.file "test.c"
.text
.globl g
.align 16, 0x90
.type g,#function
g: # #g
.cfi_startproc
# BB#0:
jmp f # TAILCALL
.Ltmp0:
.size g, .Ltmp0-g
.cfi_endproc
.section ".note.GNU-stack","",#progbits
However, on other implementations, such a tail call is not possible, so if you want to achieve the same results for a deeply recursive function, you really need to give f and g the same return type or you risk a stack overflow. (I'm aware that tail calls are not mandated.)
This doesn't mean int is faster than S, nor does it mean that S is faster than int, though. The memory use would be similar regardless of whether int or S is used, so long as the same one is consistently used.
If the compiler has any penalty on using structs instead of single variables is strictly compiler and compiler options dependent.
But there are no reasons why the compiler should make any differences when your struct contains only one member. There should be additional code necessary to access the member nor to derefence any pointer to such an struct. If you don't have this oversimplified structure with one member deferencing might cost one addtional CPU instruction depending on the used CPU.
A minimal example w/ GCC 10.2.0 -O3 gives exactly the same output i.e. no overhead introduced by struct:
diff -u0 <(
gcc -S -o /dev/stdout -x c -std=gnu17 -O3 -Wall -Wextra - <<EOF
// Out of the box
void OOTB(int *n){
*n+=999;
}
EOF
) <(
gcc -S -o /dev/stdout -x c -std=gnu17 -O3 -Wall -Wextra - <<EOF
// One member struct
typedef struct { int inq_n; } inq;
void OMST(inq *n){
n->inq_n+=999;
}
EOF
)
--- /dev/fd/63 [...]
+++ /dev/fd/62 [...]
## -4,3 +4,3 ##
- .globl OOTB
- .type OOTB, #function
-OOTB:
+ .globl OMST
+ .type OMST, #function
+OMST:
## -13 +13 ##
- .size OOTB, .-OOTB
+ .size OMST, .-OMST
Not sure about more realistic/complex situations.