I have a problem described in the title. I have an Edify language parser that runs without errors when I building it on arm but fails when I try to use it with x86. I traced segfault to yy_scan_bytes function, more precisely to this code:
YY_BUFFER_STATE yy_scan_bytes (yyconst char * yybytes, int _yybytes_len ) {
YY_BUFFER_STATE b;
char * buf;
yy_size_t n;
int i;
/* Get memory for full buffer, including space for trailing EOB's. */
n = _yybytes_len + 2;
buf = (char *) yyalloc(n );
if ( ! buf ) {
YY_FATAL_ERROR( "out of dynamic memory in yy_scan_bytes()" );
}
for ( i = 0; i < _yybytes_len; ++i ) {
buf[i] = yybytes[i]; // <==========
}
The full code is here: https://github.com/twaik/edify_x86_failing_code
I've got it from AROMA Installer source.
That's everything I discovered after debug. Thanks.
Trying to build your code gives me these errors:
main.c: In function ‘parse_string’:
main.c:27:5: warning: implicit declaration of function ‘yy_switch_to_buffer’ [-W
implicit-function-declaration]
yy_switch_to_buffer(yy_scan_string(str));
^~~~~~~~~~~~~~~~~~~
main.c:27:25: warning: implicit declaration of function ‘yy_scan_string’ [-Wimplicit-function-declaration]
yy_switch_to_buffer(yy_scan_string(str));
That means that the compiler assumes that yy_switch_to_buffer() and yy_scan_string() return an int, as it does for all functions that are not declared before use (as per the c89 standard). But that is not the case (the first returns void, and the second a pointer (YY_BUFFER_STATE)). Notice that on x86_64, the size of a pointer is not the same as the size of an int.
Adding some band-aid prototypes like
void yy_switch_to_buffer(void*);
void *yy_scan_string(const char*);
to main.c, before their use in parse_string() may stop the segfaulting.
A better fix would be to arrange in the Makefile that the lexer be run with the --header-file=lex-header.h option, and then include lex-header.h from main.c. Or even better, wrap all lex-specific code in some simple functions, and put the prototypes of those functions in a header included from both main.c and the *.l file.
Related
I have been trying to implement a small simulation to understand memory allocation of malloc(). I created a shared library called mem.c. I am linking the library to the main but cannot pass the correct address of the simulated "heap". Heap is created by a malloc() call in the shared library.
Address in the shared library: 0x55ddaff662a0
Address in the main: 0xffffffffaff662a0
Only last 4 bytes seem to be correct. Rest is set to 0xf.
However, when I #include "mem.c" in the main it works correctly. How can I achieve the same result without including the mem.c. I am trying to solve this without including mem.c or mem.h. I create shared library as this:
gcc -c -fpic mem.c
gcc -shared -o libmem.so mem.o
gcc main.c -lmem -L. -o main
From your comments
I am trying to implement without using #include mem.h or mem.c.
Then you must provide by other means a prototype for the function you're calling. Without an explicit function prototype, following the tradition of K&R and then later ANSI C, undeclared functions are assumed to return an int and take parameters of type int.
EDIT: Essentially you need to write what'd you normally find in a header, somewhere before you make first use of the function. Or of it's a function pointer you need an appropriate variable to store the function pointer.
For example to declare a function that returns an untyped pointer, and an arbitrary, unspecified number of arguments you'd write
void *getAddr();
Note that using the extern keyword here is not required, since extern linkage is always implied for non-static function declarations.
In case you want to dynamically link at runtime (using dlopen / LoadLibrary → dlsym / GetProcAddress), you'd define a function pointer variable
void* (*getAddr_fptr)();
You can set it using dlsym with
*(void**)(&getAddr_fptr) = dlsym(…)
This awkward way of writing it comes due to function pointers being allowed to have a different size and alignment as data pointers (see the dlsym manpage for details).
These days on the majority of platforms int is a 4 byte type and the most common calling convention pass the first few function arguments by register. On x86 (and x86_64) the registers are AX, BX, CX and DX and may be accessed in different sizes, but may read and write with different size (to allow size conversion). This explains why only the first 4 bytes are passed: It's passed via register and only the write to the register is done as a 4 byte wide write. When the function then reads from the register, it does so with a wider type, with the higher value bits set to all 1.
From the comments:
Do you have a declaration for getAddr in your main code?
No I don't have but I am trying to implement without a declaration, is it possible?
Then that's your problem. Without a declaration, the compiler falls back to a default declaration of int getAddr(). This is incompatible with the actual definition which returns a void *, and calling a function through an incompatible declaration triggers undefined behavior.
What probably happened is that when the return value of the function was actually returned you only got back the 4 low-order bytes. Assuming your system is little-endian, and int is 4 bytes, and a void * is 8 bytes, this would explain the low bits being the same.
You must include a valid declaration before the function is called. It doesn't necessarily have to reside in a header file, but it has to be visible at the point the call happens.
I'm assuming you're trying to accomplish something like this? For mem.c
#include <stdlib.h>
#include <stdio.h>
void* getAddr() {
char *heap = (char *)malloc(10);
printf("%p\n", (void*)heap);
return heap;
}
And then without including any headers for the mem.c functions, you'd probably create a library out of mem.c as you've already mentioned in the question and have something as follows in main.c
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
typedef void* (*getAddr)(); //prototype for getAddr() in mem.c
int main() {
void* handle = dlopen("./libmem.so", RTLD_LAZY);
if(handle) {
void* fn = dlsym(handle, "getAddr");
if(fn) {
void* addr = ((getAddr)(fn))();
printf("%p\n", addr);
free(addr);
addr = NULL;
} else {
printf("Failed to dlsym %s\n", dlerror());
}
} else {
printf("Failed to dlopen %s\n", dlerror());
}
}
EDIT: For OP's purpose as #Zilog80 mentioned, since the library is being linked with main executable, the dlopen() part can be gotten rid of and main.c can be simplified as
#include <stdio.h>
#include <stdlib.h>
extern void* getAddr(); //prototype for getAddr() in mem.c
int main() {
void* addr = getAddr();
printf("%p\n", addr);
free(addr);
addr = NULL;
}
And used similar compilation commands as OP i.e.
gcc -shared -o libmem.so -fpic mem.c
gcc main.c -lmem -L . -o main
while executing
LD_LIBRARY_PATH=. ./main
I was developing an embedded project an was struggling to compile it because of this error:
mipsel-linux-gnu-ld: main.o: in function 'fooBar':main.c:(.text+0x3ec): undefined reference to 'memcpy'
This error is caused by every operation similar to this, in which I assign the value of a pointer to a non-pointer type variable.
int a = 0;
int *ap = &a;
int c = *ap; //this causes the error
Here's another example:
state_t *exceptionState = (unsigned int) 0x0FFFF000;
currentProcess->cpu_state = *exceptionState; //this causes the error
I have already included the flag -nostdlib in the makefile...
Thank you in advance!
I have already included the flag -nostdlib in the makefile...
Take that flag out. It blocks linkage to standard library calls. The compiler might actually generate references to the memcpy function, even if your code doesn't explicitly call it.
If you absolutely need -nostdlib, I suppose you could define your own version of memcpy - if that's the only function the linker is complaining about. It won't be as optimized, but it would work. add the following code to the bottom of one of your source files:
void *memcpy(void *dest, const void *src, size_t n)
{
for (size_t i = 0; i < n; i++)
{
((char*)dest)[i] = ((char*)src)[i];
}
}
The fact that you have included -nostdlib is what's causing your problem.
If you copy a structure the compiler may call the standard C runtime function memcpy() to do it. If you link with -nostdlib then you're telling the linker to not include the standard C runtime library.
If you have to use -nostdlib then you'll have to provide your own implementation of memcpy().
To start off, I don't get this issue when I compile/"make" the code on a Linux machine which I connect to remotely. I'm experiencing it only on my Windows laptop with Mingw installed -- which I believe is causing the issue.
$ make
gcc -c parser.c
parser.c:34:7: error: conflicting types for 'gets'
34 | char* gets(char *buf, int max)
| ^~~~
In file included from parser.h:4,
from parser.c:1:
c:\mingw\include\stdio.h:709:41: note: previous declaration of 'gets' was here
709 | _CRTIMP __cdecl __MINGW_NOTHROW char * gets (char *);
| ^~~~
Makefile:13: recipe for target 'parser.o' failed
make: *** [parser.o] Error 1
Here's the gets() code as requested:
char* gets(char *buf, int max)
{
int i, cc;
char c;
for(i=0; i+1 < max; ){
cc = read(0, &c, 1);
if(cc < 1) break;
//c = getchar();
buf[i++] = c;
if(c == '\n' || c == '\r')
break;
}
buf[i] = '\0';
return buf;
}
Is there a way to fix this without changing the gets function name? Thank you sm
Your code works on Linux's gcc because the gets function was removed, as it should, since it was deprecated in the C99 standard and removed with C11.
For some reason the Windows MingW distribution still maintains gets and because of that you have a redefinition problem.
So unfortunately you can't use that function name, unless you remove it by hand from stdio.h, as C doesn't allow for function overloading.
Running sample on Linux gcc
Running sample on Windows gcc
As the error says the gets() function is already defined in stdio.h.
One trick you can do is put something like this:
#define gets MY_gets
before your definition of the gets() function.
That way you are actually defining a MY_gets() function which causes no conflict. And when you call gets() later on in your code you are actually calling MY_gets().
If you define gets() in a header file you should include stdio.h first and then put #define gets MY_gets before the declaration of gets() in the header file.
Though I don't see why you want to refine this function if it already exists.
It makes more sense to only define it if needed and surround the function with something like #ifndef HAVE_GETS and endif where HAVE_GETS should be defined based on tests done in the configure/build system.
I'm creating a cross-system application. It uses, for example, the function itoa, which is implemented on some systems but not all. If I simply provide my own itoa implementation:
header.h:115:13: error: conflicting types for 'itoa'
extern void itoa(int, char[]);
In file included from header.h:2:0,
from file.c:2:0,
c:\path\to\mingw\include\stdlib.h:631:40: note: previous declaration of 'itoa' was here
_CRTIMP __cdecl __MINGW_NOTHROW char* itoa (int, char*, int);
I know I can check if macros are predefined and define them if not:
#ifndef _SOME_MACRO
#define _SOME_MACRO 45
#endif
Is there a way to check if a C function is pre-implemented, and if not, implement it? Or to simply un-implement a function?
Given you have already written your own implementation of itoa(), I would recommend that you rename it and use it everywhere. At least you are sure you will get the same behavior on all platforms, and avoid the linking issue.
Don't forget to explain your choice in the comments of your code...
I assume you are using GCC, as I can see MinGW in your path... there's one way the GNU linker can take care of this for you. So you don't know whether there is an itoa implementation or not. Try this:
Create a new file (without any headers) called my_itoa.c:
char *itoa (int, char *, int);
char *my_itoa (int a, char *b, int c)
{
return itoa(a, b, c);
}
Now create another file, impl_itoa.c. Here, write the implementation of itoa but add a weak alias:
char* __attribute__ ((weak)) itoa(int a, char *b, int c)
{
// implementation here
}
Compile all of the files, with impl_itoa.c at the end.
This way, if itoa is not available in the standard library, this one will be linked. You can be confident about it compiling whether or not it's available.
Ajay Brahmakshatriya's suggestion is a good one, but unfortunately MinGW doesn't support weak definition last I checked (see https://groups.google.com/forum/#!topic/mingwusers/44B4QMPo8lQ, for instance).
However, I believe weak references do work in MinGW. Take this minimal example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
__attribute__ ((weak)) char* itoa (int, char*, int);
char* my_itoa (int a, char* b, int c)
{
if(itoa != NULL) {
return itoa(a, b, c);
} else {
// toy implementation for demo purposes
// replace with your own implementation
strcpy(b, "no itoa");
return b;
}
}
int main()
{
char *str = malloc((sizeof(int)*3+1));
my_itoa(10, str, 10);
printf("str: %s\n", str);
return 0;
}
If the system provides an itoa implementation, that should be used and the output would be
str: 10
Otherwise, you'll get
str: no itoa
There are two really important related points worth making here along the "don't do it like this" lines:
Don't use atoi because it's not safe.
Don't use atoi because it's not a standard function, and there are good standard functions (such as snprintf) which are available to do what you want.
But, putting all this aside for one moment, I want to introduce you to autoconf, part of the GNU build system. autoconf is part of a very comprehensive, very portable set of tools which aim to make it easier to write code which can be built successfully on a wide range of target systems. Some would argue that autoconf is too complex a system to solve just the one problem you pose with just one library function, but as any program grows, it's likely to face more hurdles like this, and getting autoconf set up for your program now will put you in a much stronger position for the future.
Start with a file called Makefile.in which contains:
CFLAGS=--ansi --pedantic -Wall -W
program: program.o
program.o: program.c
clean:
rm -f program.o program
and a file called configure.ac which contains:
AC_PREREQ([2.69])
AC_INIT(program, 1.0)
AC_CONFIG_SRCDIR([program.c])
AC_CONFIG_HEADERS([config.h])
# Checks for programs.
AC_PROG_CC
# Checks for library functions.
AH_TEMPLATE([HAVE_ITOA], [Set to 1 if function atoi() is available.])
AC_CHECK_FUNC([itoa],
[AC_DEFINE([HAVE_ITOA], [1])]
)
AC_CONFIG_FILES([Makefile])
AC_OUTPUT
and a file called program.c which contains:
#include <stdio.h>
#include "config.h"
#ifndef HAVE_ITOA
/*
* WARNING: This code is for demonstration purposes only. Your
* implementation must have a way of ensuring that the size of the string
* produced does not overflow the buffer provided.
*/
void itoa(int n, char* p) {
sprintf(p, "%d", n);
}
#endif
int main(void) {
char buffer[100];
itoa(10, buffer);
printf("Result: %s\n", buffer);
return 0;
}
Now run the following commands in turn:
autoheader: This generates a new file called config.h.in which we'll need later.
autoconf: This generates a configuration script called configure
./configure: This runs some tests, including checking that you have a working C compiler and, because we've asked it to, whether an itoa function is available. It writes its results into the file config.h for later.
make: This compiles and links the program.
./program: This finally runs the program.
During the ./configure step, you'll see quite a lot of output, including something like:
checking for itoa... no
In this case, you'll see that the config.h find contains the following lines:
/* Set to 1 if function atoi() is available. */
/* #undef HAVE_ITOA */
Alternatively, if you do have atoi available, you'll see:
checking for itoa... yes
and this in config.h:
/* Set to 1 if function atoi() is available. */
#define HAVE_ITOA 1
You'll see that the program can now read the config.h header and choose to define itoa if it's not present.
Yes, it's a long way round to solve your problem, but you've now started using a very powerful tool which can help you in a great number of ways.
Good luck!
I am reading Microsoft's CRT source code, and I can come up with the following code, where the function __initstdio1 will be executed before main() routine.
The question is, how to execute some code before entering the main() routine in VC (not VC++ code)?
#include <stdio.h>
#pragma section(".CRT$XIC",long,read)
int __cdecl __initstdio1(void);
#define _CRTALLOC(x) __declspec(allocate(x))
_CRTALLOC(".CRT$XIC") static pinit = __initstdio1;
int z = 1;
int __cdecl __initstdio1(void) {
z = 10;
return 0;
}
int main(void) {
printf("Some code before main!\n");
printf("z = %d\n", z);
printf("End!\n");
return 0;
}
The output will be:
Some code before main!
z = 10
End!
However, I am not able to understand the code.
I have done some google on .CRT$XIC but no luck is found. Can some expert explain above code segment to me, especially the followings:
What does this line _CRTALLOC(".CRT$XIC") static pinit = __initstdio1; mean? What is the significance of the variable pinit?
During compilation the compiler (cl.exe) throws a warning saying as below:
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.30729.01 for 80x86
Copyright (C) Microsoft Corporation. All rights reserved.
stdmacro.c
stdmacro.c(9) : warning C4047: 'initializing' : 'int' differs in levels of indirection from 'int (__
cdecl *)(void)'
Microsoft (R) Incremental Linker Version 9.00.30729.01
Copyright (C) Microsoft Corporation. All rights reserved.
/out:stdmacro.exe
stdmacro.obj
What is the corrective action needs to be done to remove the warning message?
Thanks in advance.
Added:
I have modified the code and give type to pinit as _PIFV. Now the warning message is gone.
The new code is as follows:
#include <stdio.h>
#pragma section(".CRT$XIC1",long,read)
int __cdecl __initstdio1(void);
typedef int (__cdecl *_PIFV)(void);
#define _CRTALLOC(x) __declspec(allocate(x))
_CRTALLOC(".CRT$XIC1") static _PIFV pinit1 = __initstdio1;
int z = 1;
int __cdecl __initstdio1(void) {
z = 100;
return 0;
}
int main(void) {
printf("Some code before main!\n");
printf("z = %d\n", z);
printf("End!\n");
return 0;
}
A simple way to do this.
#include <iostream>
int before_main()
{
std::cout << "before main" << std::endl;
return 0;
}
static int n = before_main();
void main(int argc, char* argv[])
{
std::cout << "in main" << std::endl;
}
This is what _CRTALLOC is defined as:
extern _CRTALLOC(".CRT$XIA") _PVFV __xi_a[];
extern _CRTALLOC(".CRT$XIZ") _PVFV __xi_z[];// C initializers
extern _CRTALLOC(".CRT$XCA") _PVFV __xc_a[];
extern _CRTALLOC(".CRT$XCZ") _PVFV __xc_z[];// C++ initializers
It's a table of things to pre-initialise, of which a pointer to your function __initstdio1 is placed.
This page described CRT initialisation:
http://msdn.microsoft.com/en-us/library/bb918180.aspx
In C++ at least, you don't need all that implementation specific stuff:
#include <iostream>
struct A {
A() { std::cout << "before main" << std::endl; }
};
A a;
int main() {
std::cout << "in main" << std::endl;
}
I wrote an award-winning article about this on CodeGuru a while ago.
There's some information here (search for CRT). The significance of variable pinit is none, it's just a piece of data placed in the executable, where the runtime can find it. However, I would advise you to give it a type, like this:
_CRTALLOC(".CRT$XIC") static void (*pinit)()=...
The linker warning probably just warns you you have a function that has int return type, but doesn't return anything (probably you'd better change the return type to void).
Even in C, there is a need for some code to be run before main() is entered, if only to transform the command line into the C calling convention. In practice, the standard library needs some initialization, and the exact needs can vary from compile to compile.
The true program entry point is set at link time, and is usually in a module named something like crt0 for historical reasons. As you've found, the source to that module is available in the crt sources.
To support initializations that are discovered at link time, a special segment is used. Its structure is a list of function pointers of fixed signature, which will be iterated early in crt0 and each function called. This same array (or one very much like it) of function pointers is used in a C++ link to hold pointers to constructors of global objects.
The array is filled in by the linker by allowing every module linked to include data in it, which are all concatenated together to form the segment in the finished executable.
The only significance to the variable pinit is that it is declared (by the _CRTALLOC() macro) to be located in that segment, and is initialized to the address of a function to be called during the C startup.
Obviously, the details of this are extremely platform-specific. For general programming, you are probably better served by wrapping your initialization and your current main inside a new main():
int main(int argc, char **argv) {
early_init();
init_that_modifies_argv(&argc, &argv);
// other pre-main initializations...
return real_main(argc,argv);
}
For special purposes, modifying the crt0 module itself or doing compiler-specific tricks to get additional early initialization functions called can be the best answer. For example, when building embedded systems that run from ROM without an operating system loader, it is common to need to customize the behavior of the crt0 module in order to have a stack at all on which to push the parameters to main(). In that case, there may be no better solution than to modify crt0 to initialize the memory hardware to suit your needs.