Why va_arg() produce different effects on x86_64 and arm? - c

The codes:
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
typedef unsigned int uint32_t;
float average(int n_values, ... )
{
va_list var_arg;
int count;
float sum = 0;
va_start(var_arg, n_values);
for (count = 0; count < n_values; count += 1) {
sum += va_arg(var_arg, signed long long int);
}
va_end(var_arg);
return sum / n_values;
}
int main(int argc, char *argv[])
{
(void)argc;
(void)argv;
printf("hello world!\n");
uint32_t t1 = 1;
uint32_t t2 = 4;
uint32_t t3 = 4;
printf("result:%f\n", average(3, t1, t2, t3));
return 0;
}
When I run in ubuntu (x86_64), It's Ok.
lix#lix-VirtualBox:~/test/c$ ./a.out
hello world!
result:3.000000
lix#lix-VirtualBox:~/test/c$ uname -a
Linux lix-VirtualBox 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
lix#lix-VirtualBox:~/test/c$
But when I cross-compiler and run it in openwrt(ARM 32bit), It's wrong.
[root#OneCloud_0723:/root/lx]#./helloworld
hello world!
result:13952062464.000000
[root#OneCloud_0723:/root/lx]#uname -a
Linux OneCloud_0723 3.10.33 #1 SMP PREEMPT Thu Nov 2 19:55:17 CST 2017 armv7l GNU/Linux
I know do not call va_arg with an argument of the incorrect type. But Why we can get right result in x86_64 not in arm?
Thank you.

On x86-64 Linux, each 32-bit arg is passed in a separate 64-bit register (because that's what the x86-64 System V calling convention requires).
The caller happens to have zero-extended the 32-bit arg into the 64-bit register. (This isn't required; the undefined behaviour in your program could bite you with a different caller that left high garbage in the arg-passing registers.)
The callee (average()) is looking for three 64-bit args, and looks in the same registers where the caller put them, so it happens to work.
On 32-bit ARM, long long is doesn't fit in a single register, so the callee looking for long long args is definitely looking in different places than where the caller placed uint32_t args.
The first 64-bit arg the callee sees is probably ((long long)t1<<32) | t2, or the other way around. But since the callee is looking for 6x 32 bits of args, it will be looking at registers / memory that the caller didn't intend as args at all.
(Note that this could cause corruption of the caller's locals on the stack, because the callee is allowed to clobber stack args.)
For the full details, look at the asm output of your code with your compiler + compile options to see what exactly what behaviour resulted from the C Undefined Behaviour in your source. objdump -d ./helloworld should do the trick, or look at compiler output directly: How to remove "noise" from GCC/clang assembly output?.

On my system (x86_64)
#include <stdio.h>
int main(void)
{
printf("%zu\n", sizeof(long long int));
return 0;
}
this prints 8, which tells me that long long int is 64bits wide, I don't know
the size of a long long int on arm.
Regardless your va_arg call is wrong, you have to use the correct type, in
this case uint32, so your function has undefined behaviour and happens to get
the correct values. average should look like this:
float average(int n_values, ... )
{
va_list var_arg;
int count;
float sum = 0;
va_start(var_arg, n_values);
for (count = 0; count < n_values; count += 1) {
sum += va_arg(var_arg, uint32_t);
}
va_end(var_arg);
return sum / n_values;
}
Also don't declare your uint32_t as
typedef unsigned int uint32_t;
this is not portable, because int is not guaranteed to be 4 bytes long across
all architectures. The Standard C Library actually declares this type in
stdint.h, you should use the thos types instead.
So you program should look like this:
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <stdint.h>
float average(int n_values, ... )
{
va_list var_arg;
int count;
float sum = 0;
va_start(var_arg, n_values);
for (count = 0; count < n_values; count += 1) {
sum += va_arg(var_arg, uint32_t);
}
va_end(var_arg);
return sum / n_values;
}
int main(void)
{
printf("hello world!\n");
uint32_t t1 = 1;
uint32_t t2 = 4;
uint32_t t3 = 4;
printf("result:%f\n", average(3, t1, t2, t3));
return 0;
}
this is portable and should yield the same results across different
architectures.

Related

Getting function location in executable [duplicate]

This question already has answers here:
How to get the length of a function in bytes?
(13 answers)
Closed 7 years ago.
I need the location of a code section in the executable (begin and ebn address). I tried to use two dummy functions:
void begin_address(){}
void f(){
...
}
void end_address(){}
...
printf("Function length: %td\n", (intptr_t)end_address - (intptr_t)begin_address);
The problem is, that using -O4 optimization with gcc I got a negative length. It seems that this does not work with optimizations.
I compiled f to assembly, and tried the following:
__asm__(
"func_begin:"
"movq $10, %rax;"
"movq $20, %rbx;"
"addq %rbx, %rax;"
"func_end:"
);
extern unsigned char* func_begin;
extern unsigned char* func_end;
int main(){
printf("Function begin and end address: %p\t%p\n", func_begin, func_end);
printf("Function length: %td\n", (intptr_t)func_end - (intptr_t)func_begin);
}
The problem is that even without optimization I am getting some strange output:
Function begin and end address: 0x480000000ac0c748 0xf5158b48e5894855
Function length: -5974716185612615411
How can I get the location of a function in the executable? My second question is whether referring to this address as const char* is safe or not. I am interested in both 32 and 64 bit solutions if there is a difference.
If you want to see how many bytes a function occupy in a binary, you can use objdump to disassemble the binary to see the first ip and last ip of a function. Or you can print $ebp - $esp if you want to know how many space a function use on stack.
If a viable option for you, tell gcc to compile the needed parts with -O0 instead:
#include <stdio.h>
#include <stdint.h>
void __attribute__((optimize("O0"))) func_begin(){}
void __attribute__((optimize("O0"))) f(){
return;
}
void __attribute__((optimize("O0"))) func_end(){}
int main()
{
printf("Function begin and end address: %p\t%p\n", func_begin, func_end);
printf("Function length: %td\n", (uintptr_t)func_end - (uintptr_t)func_begin);
}
I'm not sure whether __attribute__((optimize("O0"))) is needed for f().
I don't know about GCC, but in the case of some Microsoft compilers, or some versions of Visual Studio, if you build in debug mode, it creates a jump table for function entries that then jump to the actual function. In release mode, it normally doesn't use the jump table.
I thought most linker's have a map output option what would at least show the offsets to functions.
You could use an asm instruction that you could search for:
movel $12345678,eax ;search for this instruction
This worked with Microsoft C / C++ 4.1, VS2005, and VS2010 release builds:
#include <stdio.h>
void swap(char **a, char **b){
char *temp = *a;
*a = *b;
*b = temp;
}
void sortLine(char *a[], int size){
int i, j;
for (i = 0; i < size; i++){
for (j = i + 1; j < size; j++){
if(memcmp(a[i], a[j], 80) > 0){
swap(&a[i], &a[j]);
}
}
}
}
int main(int argc, char **argv)
{
void (*pswap)(char **a, char **b) = swap;
void (*psortLine)(char *a[], int size) = sortLine;
char *pfun1 = (void *) pswap;
char *pfun2 = (void *) psortLine;
printf("%p %p %x\n", pfun1, pfun2, pfun2-pfun1);
return(0);
}

How to get SIMD code from C code

I am working on a m/c Intel(R) Xeon(R) CPU E5-2640 v2 # 2.00GHz It supports SSE4.2.
I have written C code to perform XOR operation over string bits. But I want to write corresponding SIMD code and check for performance improvement. Here is my code
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#define LENGTH 10
unsigned char xor_val[LENGTH];
void oper_xor(unsigned char *r1, unsigned char *r2)
{
unsigned int i;
for (i = 0; i < LENGTH; ++i)
{
xor_val[i] = (unsigned char)(r1[i] ^ r2[i]);
printf("%d",xor_val[i]);
}
}
int main() {
int i;
time_t start, stop;
double cur_time;
start = clock();
oper_xor("1110001111", "0000110011");
stop = clock();
cur_time = ((double) stop-start) / CLOCKS_PER_SEC;
printf("Time used %f seconds.\n", cur_time / 100);
for (i = 0; i < LENGTH; ++i)
printf("%d",xor_val[i]);
printf("\n");
return 0;
}
On compiling and running a sample code I am getting output shown below. Time is 00 here but in actual project it is consuming sufficient time.
gcc xor_scalar.c -o xor_scalar
pan88: ./xor_scalar
1110111100 Time used 0.000000 seconds.
1110111100
How can I start writing a corresponding SIMD code for SSE4.2
The Intel Compiler and any OpenMP compiler support #pragma simd and #pragma omp simd, respectively. These are your best bet to get the compiler to do SIMD codegen for you. If that fails, you can use intrinsics or, as a means of last resort, inline assembly.
Note the the printf function calls will almost certainly interfere with vectorization, so you should remove them from any loops in which you want to see SIMD.

Error with gcc inline assembly

I'm trying to learn how to write gcc inline assembly.
The following code is supposed to perform an shl instruction and return the result.
#include <stdio.h>
#include <inttypes.h>
uint64_t rotate(uint64_t x, int b)
{
int left = x;
__asm__ ("shl %1, %0"
:"=r"(left)
:"i"(b), "0"(left));
return left;
}
int main()
{
uint64_t a = 1000000000;
uint64_t res = rotate(a, 10);
printf("%llu\n", res);
return 0;
}
Compilation fails with error: impossible constraint in asm
The problem is basically with "i"(b). I've tried "o", "n", "m" among others but it still doesn't work. Either its this error or operand size mismatch.
What am I doing wrong?
As written, you code compiles correctly for me (I have optimization enabled). However, I believe you may find this to be a bit better:
#include <stdio.h>
#include <inttypes.h>
uint64_t rotate(uint64_t x, int b)
{
__asm__ ("shl %b[shift], %[value]"
: [value] "+r"(x)
: [shift] "Jc"(b)
: "cc");
return x;
}
int main(int argc, char *argv[])
{
uint64_t a = 1000000000;
uint64_t res = rotate(a, 10);
printf("%llu\n", res);
return 0;
}
Note that the 'J' is for 64bit. If you are using 32bit, 'I' is the correct value.
Other things of note:
You are truncating your rotate value from uint64_t to int? Are you compiling for 32bit code? I don't believe shl can do 64bit rotates when compiled as 32bit.
Allowing 'c' on the input constraint means you can use variable rotate amounts (ie not hard-coded at compile time).
Since shl modifies the flags, use "cc" to let the compiler know.
Using the [name] form makes the asm easier to read (IMO).
The %b is a modifier. See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#i386Operandmodifiers
If you want to really get smart about inline asm, check out the latest gcc docs: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

How to get a pointer to a binary section in Mac OS X?

I'm writing some code which stores some data structures in a special named binary section. These are all instances of the same struct which are scattered across many C files and are not within scope of each other. By placing them all in the named section I can iterate over all of them.
This works perfectly with GCC and GNU ld. Fails on Mac OS X due to missing __start___mysection and __stop___mysection symbols. I guess llvm ld is not smart enough to provide them automatically.
In GCC and GNU ld, I use __attribute__((section(...)) plus some specially named extern pointers which are magically filled in by the linker. Here's a trivial example:
#include <stdio.h>
extern int __start___mysection[];
extern int __stop___mysection[];
static int x __attribute__((section("__mysection"))) = 4;
static int y __attribute__((section("__mysection"))) = 10;
static int z __attribute__((section("__mysection"))) = 22;
#define SECTION_SIZE(sect) \
((size_t)((__stop_##sect - __start_##sect)))
int main(void)
{
size_t sz = SECTION_SIZE(__mysection);
int i;
printf("Section size is %u\n", sz);
for (i=0; i < sz; i++) {
printf("%d\n", __start___mysection[i]);
}
return 0;
}
What is the general way to get a pointer to the beginning/end of a section with FreeBSD linker. Anyone have any ideas?
For reference linker is:
#(#)PROGRAM:ld PROJECT:ld64-127.2
llvm version 3.0svn, from Apple Clang 3.0 (build 211.12)
Similar question was asked about MSVC here: How to get a pointer to a binary section in MSVC?
You can get the Darwin linker to do this for you.
#include <stdio.h>
extern int start_mysection __asm("section$start$__DATA$__mysection");
extern int stop_mysection __asm("section$end$__DATA$__mysection");
// If you don't reference x, y and z explicitly, they'll be dead-stripped.
// Prevent that with the "used" attribute.
static int x __attribute__((used,section("__DATA,__mysection"))) = 4;
static int y __attribute__((used,section("__DATA,__mysection"))) = 10;
static int z __attribute__((used,section("__DATA,__mysection"))) = 22;
int main(void)
{
long sz = &stop_mysection - &start_mysection;
long i;
printf("Section size is %ld\n", sz);
for (i=0; i < sz; ++i) {
printf("%d\n", (&start_mysection)[i]);
}
return 0;
}
Using Mach-O information:
#include <mach-o/getsect.h>
char *secstart;
unsigned long secsize;
secstart = getsectdata("__SEGMENT", "__section", &secsize);
The above gives information about section declared as:
int x __attribute__((section("__SEGMENT,__section"))) = 123;
More information: https://developer.apple.com/library/mac/documentation/developertools/conceptual/machoruntime/Reference/reference.html

What is the difference between `cc -std=c99` and `c99` on Mac OS?

Given the following program:
/* Find the sum of all the multiples of 3 or 5 below 1000. */
#include <stdio.h>
unsigned long int method_one(const unsigned long int n);
int
main(int argc, char *argv[])
{
unsigned long int sum = method_one(1000000000);
if (sum != 0) {
printf("Sum: %lu\n", sum);
} else {
printf("Error: Unsigned Integer Wrapping.\n");
}
return 0;
}
unsigned long int
method_one(const unsigned long int n)
{
unsigned long int i;
unsigned long int sum = 0;
for (i=1; i!=n; ++i) {
if (!(i % 3) || !(i % 5)) {
unsigned long int tmp_sum = sum;
sum += i;
if (sum < tmp_sum)
return 0;
}
}
return sum;
}
On a Mac OS system (Xcode 3.2.3) if I use cc for compilation using the -std=c99 flag everything seems just right:
nietzsche:problem_1 robert$ cc -std=c99 problem_1.c -o problem_1
nietzsche:problem_1 robert$ ./problem_1
Sum: 233333333166666668
However, if I use c99 to compile it this is what happens:
nietzsche:problem_1 robert$ c99 problem_1.c -o problem_1
nietzsche:problem_1 robert$ ./problem_1
Error: Unsigned Integer Wrapping.
Can you please explain this behavior?
c99 is a wrapper of gcc. It exists because POSIX requires it. c99 will generate a 32-bit (i386) binary by default.
cc is a symlink to gcc, so it takes whatever default configuration gcc has. gcc produces a binary in native architecture by default, which is x86_64.
unsigned long is 32-bit long on i386 on OS X, and 64-bit long on x86_64. Therefore, c99 will have a "Unsigned Integer Wrapping", which cc -std=c99 does not.
You could force c99 to generate a 64-bit binary on OS X by the -W 64 flag.
c99 -W 64 proble1.c -o problem_1
(Note: by gcc I mean the actual gcc binary like i686-apple-darwin10-gcc-4.2.1.)
Under Mac OS X, cc is symlink to gcc (defaults to 64 bit), and c99 is not (defaults to 32bit).
/usr/bin/cc -> gcc-4.2
And they use different default byte-sizes for data types.
/** sizeof.c
*/
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv)
{
printf("sizeof(unsigned long int)==%d\n", (int)sizeof(unsigned long int));
return EXIT_SUCCESS;
}
cc -std=c99 sizeof.c
./a.out
sizeof(unsigned long int)==8
c99 sizeof.c
./a.out
sizeof(unsigned long int)==4
Quite simply, you are overflowing (aka wrapping) your integer variable when using the c99 compiler.
.PMCD.

Resources