Accessing an Undefined External Array - c

I'm working through K&R so pardon my mediocre understanding of C. I'm trying to store the values of argv as ints in an external array. So my first question is: is it possible to create an external array with size dependent on argc? Or is there any workaround other than just using an arbitrarily long array and hoping it all fits.
Second I was experimenting with using a pointer to an undefined integer array. I was able to increment the pointer a large number of times, both reading from (almost everything was zero) and writing to the pointed memory before I got a "Bus error: 10." Is there a reason that I was able to access so much memory before I got a bus error, or is that all just part of "undefined behavior?"
Here is the code that tested the undefined array.
#include <stdio.h>
int main(void);
int test();
int a[];
int main(void)
{
test();
}
int test()
{
extern int a[];
int *a0 = a;
printf("%d\n", *a0);
while (1) {
*a0 = 1;
a0++;
printf("%d\n", *a0);
}
return 0;
}

is it possible to create an external array with size dependent on argc?
Not really. You can allocate an array and store a pointer to it into a global storage segment, but it would be a pointer, not an array.
I was able to increment the pointer a large number of times, both reading from (almost everything was zero) and writing to the pointed memory before I got a "Bus error: 10."
The very first read or write outside the array is illegal. The fact that you do not get "Buss Error: 10." right away is an unfortunate coincidence, because the code may appear to work when it is actually incorrect.
is that all just part of "undefined behavior?"
Yes, this is undefined behavior.

Firstly, let's clarify a bit what extern does. In C, there's a thing called translation units. For now, imagine them as being just .c files. When you compile them, each .c file forms a translation unit and produces an object file (.o). The object files are then put together to create the final executable file by the linker. Sometimes you want to define (i.e. "create") something, like an array, in one translation unit, but use it in multiple translation units. This is fine as long as C is concerned: everything MUST be defined ONCE, but can be used multiple times. However, you need to inform the compiler that you want the array defined in another translation unit, not to define another array. In order to do this, you use the extern keyword.
In your case, your array a is just a global array, that can be accessed from anywhere in this translation unit (i.e. in this file). So, you don't really need to declare it in the main function using the extern keyword.
What you really want to ask is whether you can create a global array with a size equal to argc.
Up until C99 (K&R is pre-C99), the size of the array must be constant. So you have to create an array big enough. However, you should not hope that it fits - this is a very common way to create bugs and vulnerabilities. Instead, you should check if its usage is sensible and if the size is exceeded, generate an error. If you don't, you get undefined behaviour. It may work, or it may not. And it may be exploited for malicious purposes.
The C99 standard suports variable length arrays, so you can have something like int a[argc];. However, this works just for a local array.
The solution would be to create a global pointer and allocate dinamically memory for it in the main function using the malloc function.

Related

Is it possible in C to create a file scope object whose address may not be computed?

What I want is to ensure that file scope variables (in my program) can not be modified from outside the file. So I declare them as 'static' to preclude external linkage. But I also want to make sure that this variable can not be modified via pointers.
I want something similar to the 'register' storage class, in that
the address of any part of an object declared with storage-class
specifier register cannot be computed, either explicitly (by use of
the unary & operator) or implicitly (by converting an array name to a
pointer).
but without the limitations of the 'register' keyword (can not be used on file scope variables, arrays declared as register can not be indexed).
That is,
<new-keyword> int array[SIZE] = {0};
int a = array[0]; /* should be valid */
int *p = array; /* should be INVALID */
p = &array[3]; /* should be INVALID */
What is the best way to go about achieving this goal?
Why do I desire such a feature?
The usage scenario is that this file will be modified by many people in the future even when I can not personally overview all modifications. I want to preclude as many potential bugs as possible. In this case I want to make sure that variables meant to be 'private' to the module will remain so without having to depend just on documentation and/or discipline
No, I don't think you can do so, at least not cleanly. But I also fail to understand your usage case fully.
If your object is static, nobody knows its name outside of your module. So nobody can use & to take its address.
If you need to expose it, and don't want other parts of the program modifying it, write a function that exposes it as a constant pointer:
static int array[SIZE];
const int * get_array(void)
{
return array;
}
Then compile with warnings. If somebody casts away the const, it's their problem.
Assuming you are concerned with security issues, here are a few things to consider:
The purpose of register keyword was to recommend the compiler to keep that variable in a register, as it will be intensively used. As the registers don't have a memory address, it is impossible to get it (although this wasn't the primary purpose of this keyword; it is merely a side-effect). As compilers got better at generating efficient code, this is not needed any more.
Even if you could make all objects in your code "addess-proof" (impossible to get their address), the program will still not be 100% safe. Those objects are still stored in memory, which is still visible. By analysing the binary files, using debuggers, analysing the memory map and so on, one could find out those memory addresses.
This is not a good practice. In order for someone to get the variable of an object in a module, that object must be global, which is bad. So you should worry about having global variables, not about their visibility. Here you can find more details about why is it bad to have variables in the global scope.
As a semi-solution to your "problem", you can declare them const static. This way they cannot be accessed from outside the module and if it happens, no one can change their value.

Why are we not allowed to have assignment statements in the file scope in C?

Why are we only allowed to declare and define variables in global section?Why not include assignment in global section?
Example:
#include<stdio.h>
int a;
a=5;//Valid because its similar to int a=5; Therefore a initialiser to a Tentative definition
a=8;//Invalid because We can have only one initialiser for a tentative definition
void main(){
...
}
Why do we need this? What would be the consequences if we were allowed to have more than one initializer to a tentative definition
My next question is why only constant initializer elements are allowed?
#include<stdio.h>
int i=5;
int j=i+5;//[Error] initializer element is not constant
void main(){
...
}
Similarly what would be the consequences we face if this rule was not present?
Please note my question is not exactly why this happens? I'm trying to figure why these restrictions were given in the first place.
For both questions the answer is the same, in file scope there is no execution of statements or evaluation of expressions, all is done at compile time.
Other languages (C++ is an example) have a model for dynamic initialization at program startup. This is a complicated issue, e.g because initializers that come from different compilation units don't have a natural ordering among them, but might implicitly depend on each other. SO is an excelent source of information for this question, too.
C tries to stay simple, simple to use for a programmer and simple to implement for compiler builders.
We are not allowed to use assignment in file scope because program execution starts from main. Compiler creates _start function which is executed first and then jump to main is made from there. When main returns, control goes back to _start which is having proper exit procedure to terminate program. So anything which is written outside the functions is only meant for the initializations which will be done compile time
Initialization is different from declaration and assignment. When we initialize variable compiler will make such arrangement that when program execution starts, its value will be what we have initialized. But when we declare a variable, it will be having default initial value specified by its scope. Assignment is done at runtime and not at compile time

C: static int gets strange value

(changed name of variable from original question to fit the actual code)
I'm new to C and I'm implementing a queue.
The error is with the static int head=0 variable. It's incremented by 1 each time dequeue() is called. The error seem to occur when the queue is dequeued and function get_person() is called. The head-variable is then as it seems getting a high random number, like 23423449. I have no idea where this comes from. However if I get rid of the "static" keyword so variable is declared as int head=0, it works fine. How come?
using a "global" variable in top of a included file: static int variable1=0
This clearly indicates, that you don't understand what the static keyword means on the global scope. In the global scope, outside of a function, static means, that the variable is visible to only the code within the compilation unit the variable has been defined in.
Now if you define a static variable in a header, each compilation unit that includes that header will have its own variable of that name. So your program is littered with many identically named variables each specific to the compilation unit it's in.
I think what you actually want is an non-static, extern declaration in the header, and exactly one compilation unit actually defining the variable.
I think you are overrunning your person array
One of the strcpy functions is going beyond the bounds of the buffers in the person object, and overwriting the head variable. I would guess the tail and nbr_elem are going too.
You should check that the number of characters you are copying does not exceed the buffer lengths, or use strncpy.
If you declare a global static variable in file A.c, it means this variable is only available within the scope of this A.c file. See : http://en.wikipedia.org/wiki/Static_variable
Since you haven't posted any code, and you are metioning using the same variable in a different file (e.g. B.c), it seems like it is invoking an undefined behavior, which explains the random number your program is printing.
If you wish to use the variable in a different .c file, you should not make it static.
You are calling strcpy without checking that the values you are trying to write will actually fit inside the allocated space inside the person struct.
What is most likely happening is you are writing beyond the allocated memory, and your strcpy is actually overwriting the value of head. strcpy will keep writing until it hits a null terminator ('\0').
If you were to run this in valgrind (a useful tool for finding this type of problem in large programs) it would probably tell you you have invalid writes occurring.
C assumes you know what you are doing, as long as you have access to the memory you can do with it as you please :)

Pointer to function vs global variable

New EE with very little software experience here.
Have read many questions on this site over the last couple years, this would be my first question/post.
Haven't quite found the answer for this one.
I would like to know the difference/motivation between having a function modify a global variable within the body (not passing it as a parameter), and between passing the address of a variable.
Here is an example of each to make it more clear.
Let's say that I'm declaring some functions "peripheral.c" (with their proper prototypes in "peripheral.h", and using them in "implementation.c"
Method 1:
//peripheral.c
//macros, includes, etc
void function(*x){
//modify x
}
.
//implementation.c
#include "peripheral.h"
static uint8 var;
function(&var); //this will end up modifying var
Method 2:
//peripheral.c
//macros, includes, etc
void function(void){
//modify x
}
.
//implementation.c
#include "peripheral.h"
static uint8 x;
function(); //this will modify x
Is the only motivation to avoid using a "global" variable?
(Also, is it really global if it just has file scope?)
Hopefully that question makes sense.
Thanks
The function that receives a parameter pointing to the variable is more general. It can be used to modify a global, a local or indeed any variable. The function that modifies the global can do that task and that task only.
Which is to be preferred depends entirely on the context. Sometimes one approach is better, sometimes the other. It's not possible to say definitively that one approach is always better than the other.
As for whether your global variable really is global, it is global in the sense that there is one single instance of that variable in your process.
static variables have internal linkage, they cannot be accessed beyond the translation unit in which they reside.
So if you want to modify a static global variable in another TU it will be have to be passed as an pointer through function parameter as in first example.
Your second example cannot work because x cannot be accessed outside implementation.c, it should give you an compilation error.
Good Read:
What is external linkage and internal linkage?
First of all, in C/C++, "global" does mean file scope (although if you declare a global in a header, then it is included in files that #include that header).
Using pointers as parameters is useful when the calling function has some data that the called function should modify, such as in your examples. Pointers as parameters are especially useful when the function that is modifying its input does not know exactly what it is modifying. For example:
scanf("%d", &foo);
scanf is not going to know anything about foo, and you cannot modify its source code to give it knowledge of foo. However, scanf takes pointers to variables, which allows it to modify the value of any arbitrary variable (of types it supports, of course). This makes it more reusable than something that relies on global variables.
In your code, you should generally prefer to use pointers to variables. However, if you notice that you are passing the same chunk of information around to many functions, a global variable may make sense. That is, you should prefer
int g_state;
int foo(int x, int y);
int bar(int x, int y);
void foobar(void);
...
to
int foo(int x, int y, int state);
int bar(int x, int y, int state);
void foobar(int state);
...
Basically, use globals for values that should be shared by everything in the file they are in (or files, if you declare the global in a header). Use pointers as parameters for values that should be passed between a smaller group of functions for sharing and for situations where there may be more than one variable you wish to do the same operations to.
EDIT: Also, as a note for the future, when you say "pointer to function", people are going to assume that you mean a pointer that points to a function, rather than passing a pointer as a parameter to a function. "pointer as parameter" makes more sense for what you're asking here.
Several different issues here:
In general, "global variables are bad". Don't use them, if you can avoid it. Yes, it preferable to pass a pointer to a variable so a function can modify it, than to make it global so the function can implicitly modify it.
Having said that, global variables can be useful: by all means use them as appropriate.
And yes, "global" can mean "between functions" (within a module) as well as "between modules" (global throughout the entire program).
There are several interesting things to note about your code:
a) Most variables are allocated from the "stack". When you declare a variable outside of a function like this, it's allocated from "block storage" - the space exists for the lifetime of the program.
b) When you declare it "static", you "hide" it from other modules: the name is not visible outside of the module.
c) If you wanted a truly global variable, you would not use the keyword "static". And you might declare it "extern uint8 var" in a header file (so all modules would have the definition).
I'm not sure your second example really works, since you declared x as static (and thus limiting its scope to a file) but other then that, there are some advantages of the pointer passing version:
It gives you more flexibility on allocation and modularity. While you can only have only one copy of a global variable in a file, you can have as many pointers as you want and they can point to objects created at many different places (static arrays, malloc, stack variables...)
Global variables are forced into every function so you must be always aware that someone might want to modify them. On the other hands, pointers can only be accessed by functions you explicitely pass them to.
In addition to the last point, global variables all use the same scope and it can get cluttered with too many variables. On the other hand, pointers have lexical scoping like normal varialbes and their scope is much more restricted.
And yes, things can get somewhat blurry if you have a small, self contained file. If you aren't going to ever instantiate more then one "object" then sometimes static global variables (that are local to a single file) work just as well as pointers to a struct.
The main problem with global variables is that they promote what's known as "tight coupling" between functions or modules. In your second design, the peripheral module is aware of and dependent on the design of implementation, to the point that if you change implementation by removing or renaming x, you'll break peripheral, even without touching any its code. You also make it impossible to re-use peripheral independently of the implementation module.
Similarly, this design means function in peripheral can only ever deal with a single instance of x, whatever x represents.
Ideally, a function and its caller should communicate exclusively through parameters, return values, and exceptions (where appropriate). If you need to maintain state between calls, use a writable parameter to store that state, rather than relying on a global.

Segmentation fault while accessing a function-static structure via returned pointer

I have the following structure:
struct sys_config_s
{
char server_addr[256];
char listen_port[100];
char server_port[100];
char logfile[PATH_MAX];
char pidfile[PATH_MAX];
char libfile[PATH_MAX];
int debug_flag;
unsigned long connect_delay;
};
typedef struct sys_config_s sys_config_t;
I also have a function defined in a static library (let's call it A.lib):
sys_config_t* sys_get_config(void)
{
static sys_config_t config;
return &config;
}
I then have a program (let's call it B) and a dynamic library (let's call it C). Both B and C link with A.lib. At runtime B opens C via dlopen() and then gets an address to C's function func() via a call to dlsym().
void func(void)
{
sys_get_config()->connect_delay = 1000;
}
The above code is the body of C's func() function and it produces a segmentation fault when reached. The segfault only occurs while running outside of gdb.
Why does that happen?
EDIT: Making sys_config_t config a global variable doesn't help.
The solution is trivial. Somehow, by a header mismatch, the PATH_MAX constant was defined differently in B's and C's compilation units. I need to be more careful in the future. (facepalms)
There is no difference between the variable being a static-local, or a static-global variable. A static variable is STATIC, that means, it is not, on function-call demand, allocated on the stack within the current function frame, but rather it is allocated in one of the preexisting segments of the memory defined in the executable's binary headers.
That's what I'm 100% sure. The question, where in what segment they exactly placed, and whether they are properly shared - is an another problem. I've seen similar problems with sharing global/static variables between modules, but usually, the core of the problem was very specific to the exact setup..
Please take into consideration, that the code sample is small, and I worked on that platforms long time ago. What I've written above might got mis-worded or even be plainly wrong at some points!
I think, that the important thing is that you are getting that segfault in C when touching that line. Setting an integer field to a constant could not have failed, never, provided that target address is valid and not write-protected. That leaves two options:
- either your function sys_get_config() has crashed
- or it has returned an invalid pointer.
Since you say that the segfault is raised here, not in sys_get_config, the only thing left is the latter point: broken pointer.
Add to the sys_get_config some trivial printf that will dump the address-to-be-returned, then do the same in the calling function "func". Check whether it not-null, and also check if within sys_get_config it is the same as after being returned, just to be sure that calling conventions are proper, etc. A good idea for making a double/triple check is to also add inside the module "A" a copy of the function sys_get_config (with different name of course), and to check whether the addresses returned from sys_get_config and it's copy are the same. If they are not - something went very wrong during the linking
There is also a very very small chance that the module loading has been deferred, and you are trying to reference a memory of a module that was not fully initialized yet.. I worked on linux very long time ago, but I remember that dlopen has various loading options. But you wrote that you got the address by dlsym, so I suppose the module has loaded since you've got the symbol's final address..

Resources