I wrote this code today, just out of experimentation, and I'm trying to figure out the output.
/*
* This code in C attempts to exploit insufficient bounds checking
* to legitimate advantage.
*
* A dynamic structure with the accessibility of an array.
* Handy for small-time code, but largely unreliable.
*/
int array[1] = {0};
int index = 0;
put(), get();
main ( )
{
put(1); put(10), put(100);
printf("%6d %5d %5d\n", get(0), get(1), get(2));
}
put ( x )
int x;
{
array[index++] = x;
}
get ( index )
int index;
{
return array[index];
}
The output:
1 3 100
There is a problem there, in that you declare 'array' as an array of length 1 but you write 3 values to it. It should be at least 'array[3]'. Without that, you are writing to unallocated memory, so anything could happen.
The reason it outputs '3' there without the fix is that it is outputting the value of the global 'index' variable, which is the next int in memory (in your case - as I said anything could happen). Even though you do overwrite this with your put(10) call, the index value is used in as the index in the assignment and then post-incremented, which will set it back to 2 - it then gets set to 3 at the end of the put(100) call and subsequently output via printf.
It's undefined behavior, so the only real explanation is "It does some things on one machine and other things on other machines".
Also, what's with the K&R function syntax?
EDIT: The printf guess was wrong. As far as the syntax, read K&R 2nd Edition (the cover has a red ANSI stamp), which uses modern function syntax (among other useful updates).
To expand on what has been said, accessing out-of-bounds array members results in undefined behavior. Undefined behavior means that literally anything could happen. There is no way to exploit undefined behavior unless you're deep into esoteric platform-specific hacks. Don't do it.
If you do want a "dynamic array", you'll have to take care of it yourself. If your requirements are simple, you can just malloc and realloc a buffer. If your needs are more complicated, you might want to define a struct that keeps a separate buffer, a size, and a count, and write functions that operate on that struct. If you're just learning, try it both ways.
Finally, your function declaration syntax is valid, but archaic. That form is rarely seen, and virtually unheard of in new code. Declare put as:
int put(int x) {…}
And always declare main as:
int main(int argc, char **argv) {…}
The names of argc and argv aren't important, but the types are. If you forget those parameters, demons could fly out of your nose.
Related
I have a question about this code below:
#include <stdio.h>
char abcd(char array[]);
int main(void)
{
char array[4] = { 'a', 'b', 'c', 'd' };
printf("%c\n", abcd(array));
return 0;
}
char abcd(char array[])
{
char *p = array;
while (*p) {
putchar(*p);
p++;
}
putchar(*p);
putchar(p[4]);
return *p;
}
Why isn't segmentation fault generated when this program comes across putchar(*p) right after exiting while loop? I think that after *p went beyond the array[3] there is supposed to be no value assigned to other memory locations. For example, trying to access p[4] would be illegal because it would be out of the bound, I thought. On the contrary, this program runs with no errors. Is this because any other memories which no value are assigned (in this case any other memories than array[4]) should be null, whose value is '\0'?
OP seems to think accessing an array out-of-bounds, something special should happen.
Accessing outside array bounds is undefined behavior (UB). Anything may happen.
Let's clarify what a undefined behavior is.
The C standard is a contract between the developer and the compiler as to what the code means. However, it just so happens that you can write things that are just outside what is defined by the standard.
One of the most common cases is trying to do out-of-bounds access. Other languages say that this should result in an exception or another error. C does not. An argument is that it would imply adding costly checks at every array access.
The compiler does not know that what you are writing is undefined behavior¹. Instead, the compiler assumes that what you write contains no undefined behavior, and translate your code to assembly accordingly.
If you want an example, compile the code below with or without optimizations:
#include <stdio.h>
int table[4] = {0, 0, 0, 0};
int exists_in_table(int v)
{
for (int i = 0; i <= 4; i++) {
if (table[i] == v) {
return 1;
}
}
return 0;
}
int main(void) {
printf("%d\n", exists_in_table(3));
}
Without optimizations, the assembly I get from gcc does what you might expect: it just goes too far in the memory, which might cause a segfault if the array is allocated right before a page boundary.
With optimizations, however, the compiler looks at your code and notices that it cannot exit the loop (otherwise, it would try to access table[4], which cannot be), so the function exists_in_table necessarily returns 1. And we get the following, valid, implementation:
exists_in_table(int):
mov eax, 1
ret
Undefined behavior means undefined. They are very tricky to detect since they can be virtually invisible after compiling. You need advanced static analyzer to interpret the C source code and understand whether what it does can be undefined behavior.
¹ in the general case, that is; modern compilers use some basic static analyzer to detect the most common errors
C does no bounds checking on array accesses; because of how arrays and array subscripting are implemented, it can't do any bounds checking. It simply doesn't know that you've run past the end of the array. The operating environment will throw a runtime error if you cross a page boundary, but up until that point you can read or clobber any memory following the end of the array.
The behavior on subscripting past the end of the array is undefined - the language definition does not require the compiler or the operating environment to handle it any particular way. You may get a segfault, you may get corrupted data, you may clobber a frame pointer or return instruction address and put your code in a bad state, or it may work exactly as expected.
There are few remark points inside your program:
array inside the main and abcd function are different. In main, it is array of 4 elements, in abcd, it is an input variable with array type. If inside main, you call something like array[4] there will be compiler warnings for this. But there won't be compiler warning if you call in side abcd.
*p is a pointer point to array or in other word, it point to first element of array. In C, there isn't any boundary or limit for p. Your program is lucky because the memory after array contains 0 value to stop the while(*p) loop. If you did check the address of pointer p (&p). It might not equal to array[4].
may you please tell me why by running this two codes I have different output?
void UART_OutString(unsigned char buffer[]){
int i;
while(buffer[i]){
UART_OutChar(buffer[i]);
i++;
}
}
and
void UART_OutString(unsigned char buffer[]){
int i = 0;
while(buffer[i]){
UART_OutChar(buffer[i++]);
}
}
regards, Genadi
You didn't initialize the i variable in the first case, so it's an uninteresting typo bug that your compiler ought to warn you about...
That being said, we can apply the KISS principle and rewrite the whole code in the most readable way possible, a for loop, which by its nature makes it very hard to forget to initialize the loop iterator:
void UART_OutString(const char* buf[]){
for(int i=0; buf[i]!='\0'; i++){
UART_OutChar(buffer[i]);
}
}
As it turns out, the most readable way is very often the fastest way possible too.
(However, int might be inefficient on certain low-end systems, so if you are fine with only using strings with length 255 or less, uint8_t i would be a better choice. Embedded systems should never use int and always the stdint.h types.)
For what it's worth, I'd implement this as
void UART_OutChar(unsigned char c);
void UART_OutString(unsigned char buffer[]){
for(unsigned char *p = buffer; *p; p++) {
UART_OutChar(*p);
}
}
to avoid the separate counter variable at all.
It is always a good idea to initialize local variables, especially in C where you should assume that nothing is done for you (because that's usually the case). There is a reason why regulated languages would not allow you to do this.
I believe reading the unassigned variable will result in unspecified behaviour (effectively C doesn't know there isn't meant to be anything there and will just grab what ever), this means it is completely un-predictable.
This could also cause all kinds of problems as you then index an array with it and C will not stop you from indexing an array out of bounds so if the random i value C happens to grab is larger than the size of the array then you will experience undefined behaviour in what buffer[i] returns. This one could be particularly nasty as it could cause any kind of memory read / segmentation fault are crash your program depending on quite what it decides to read.
Therefor unassigned i = random behaviour, and you then get more random behaviour from using that i value to index your array.
I believe this is about all the reasons that this is a bad idea. In C it is particular important to pay attention to things like this as it will often allow you to compile and run your code.
Both initialising i, and using the solution in #AKX's answer are good solutions although i thaught this would more answer your question of why they return differently. To which really the answer is the first approach returns completely randomly
I was reading through some source code and found a functionality that basically allows you to use an array as a linked list? The code works as follows:
#include <stdio.h>
int
main (void)
{
int *s;
for (int i = 0; i < 10; i++)
{
s[i] = i;
}
for (int i = 0; i < 10; i++)
{
printf ("%d\n", s[i]);
}
return 0;
}
I understand that s points to the beginning of an array in this case, but the size of the array was never defined. Why does this work and what are the limitations of it? Memory corruption, etc.
Why does this work
It does not, it appears to work (which is actually bad luck).
and what are the limitations of it? Memory corruption, etc.
Undefined behavior.
Keep in mind: In your program whatever memory location you try to use, it must be defined. Either you have to make use of compile-time allocation (scalar variable definitions, for example), or, for pointer types, you need to either make them point to some valid memory (address of a previously defined variable) or, allocate memory at run-time (using allocator functions). Using any arbitrary memory location, which is indeterminate, is invalid and will cause UB.
I understand that s points to the beginning of an array in this case
No the pointer has automatic storage duration and was not initialized
int *s;
So it has an indeterminate value and points nowhere.
but the size of the array was never defined
There is neither array declared or defined in the program.
Why does this work and what are the limitations of it?
It works by chance. That is it produced the expected result when you run it. But actually the program has undefined behavior.
As I have pointed out first on the comments, what you are doing does not work, it seems to work, but it is in fact undefined behaviour.
In computer programming, undefined behavior (UB) is the result of
executing a program whose behavior is prescribed to be unpredictable,
in the language specification to which the computer code adheres.
Hence, it might "work" sometimes, and sometimes not. Consequently, one should never rely on such behaviour.
If it would be that easy to allocate a dynamic array in C what would one use malloc?! Try it out with a bigger value than 10 to increase the likelihood of leading to a segmentation fault.
Look into the SO Thread to see the how to properly allocation and array in C.
When writing a program in which I ask the user to enter number N, which I have to use to allocate the memory for an int array, what is the correct way to handle this:
First approach:
int main() {
int array[],n;
scanf("%d\n",&n);
array = malloc(n * sizeof(int));
}
or the second approach:
int main() {
int n;
scanf("%d\n",&n);
int array[n];
}
Either one will work (though the first case needs to be changed from int array[] to int *array); the difference depends on where the array is stored.
In the first case, the array will be stored in the heap, while in the second case, it'll (most likely) be stored on the stack. When it's stored on the stack, the maximum size of the array will be much more limited based on the limit of the stack size. If it's stored in the heap, however, it can be much larger.
Your second approach is called a variable length array (VLA), and is supported only as of c99. This means that if you intend your code to be compatible with older compilers (or to be read and understood by older people..), you may have to fall back to the first option which is more standard. Note that dynamically allocating data requires proper maintenance, the most important part of that being - freeing it when you're done (which you don't do in your program)
Assuming you meant to use int *array; instead of int array[];(The first one wouldn't compile).
Always use the first approach unless you know the array size is going to be very small and you have the intimate knowledge of the platforms your will be running on. Naturally, the question arises how small is small enough?
The main problem with second approach is that there's no portable way to verify whether the VLA (Varible Length Array) allocation succeeded. The advantage is that you don't have to manage the memory but that's hardly an "advantage" considering the risk of undefined behaviour in case memory allocation fails.
It was introduced in C99 and been made optional in C11. That suggests the committee found it not-so-useful. Also, C11 compilers may not support it and you have to perform additional check whether your compiler supports it or not by checking if __STDC_NO_VLA__ has been defined.
Automatic storage allocation for an array as small int my_arr[10]; could fail. This is an extreme and unrealistic example in modern operating systems, but possible in theory. So I suggest to avoid VLAs in any serious projects.
You did say you wanted a COMMAND LINE parameter:
int main (int argc, char **argv)
{
int *array ;
int count ;
if (argc < 2)
return 1 ;
count = atoi (argv[1]) ;
array = malloc (sizeof(int)*count) ;
. . . . .
free (array) ;
return 0 ;
}
I've been trying to get into the habit of defining trivial variables at the point they're needed. I've been cautious about writing code like this:
while (n < 10000) {
int x = foo();
[...]
}
I know that the standard is absolutely clear that x exists only inside the loop, but does this technically mean that the integer will be allocated and deallocated on the stack with every iteration? I realise that an optimising compiler isn't likely to do this, but it that guaranteed?
For example, is it ever better to write:
int x;
while (n < 10000) {
x = foo();
[...]
}
I don't mean with this code specifically, but in any kind of loop like this.
I did a quick test with gcc 4.7.2 for a simple loop differing in this way and the same assembly was produced, but my question is really are these two, according to the standard, identical?
Note that "allocating" automatic variables like this is pretty much free; on most machines it's either a single-instruction stack pointer adjustment, or the compiler uses registers in which case nothing needs to be done.
Also, since the variable remains in scope until the loop exits, there's absolutely no reason to "delete" (=readjust the stack pointer) it until the loop exits, I certainly wouldn't expect there to be any overhead per-iteration for code like this.
Also, of course the compiler is free to "move" the allocation out of the loop altogether if it feels like it, making the code equivalent to your second example with the int x; before the while. The important thing is that the first version is easier to read and more tighly localized, i.e. better for humans.
Yes, the variable x inside the loop is technically defined on each iteration, and initialized via the call to foo() on each iteration. If foo() produces a different answer each time, this is fine; if it produces the same answer each time, it is an optimization opportunity – move the initialization out of the loop. For a simple variable like this, the compiler typically just reserves sizeof(int) bytes on the stack — if it can't keep x in a register — that it uses for x when x is in scope, and may reuse that space for other variables elsewhere in the same function. If the variable was a VLA — variable length array — then the allocation is more complex.
The two fragments in isolation are equivalent, but the difference is the scope of x. In the example with x declared outside the loop, the value persists after the loop exits. With x declared inside the loop, it is inaccessible once the loop exits. If you wrote:
{
int x;
while (n < 10000)
{
x = foo();
...other stuff...
}
}
then the two fragments are near enough equivalent. At the assembler level, you'll be hard pressed to spot the difference in either case.
My personal point of view is that once you start worrying about such micro-optimisations, you're doomed to failure. The gain is:
a) Likely to be very small
b) Non-portable
I'd stick with code that makes your intention clear (i.e. declare x inside the loop) and let the compiler care about efficiency.
There is nothing in the C standard that says how the compiler should generate code in either case. It could adjust the stack pointer on every iteration of the loop if it fancies.
That being said, unless you start doing something crazy with VLAs like this:
void bar(char *, char *);
void
foo(int x)
{
int i;
for (i = 0; i < x; i++) {
char a[i], b[x - i];
bar(a, b);
}
}
the compiler will most likely just allocate one big stack frame at the beginning of the function. It's harder to generate code for creating and destroying variables in blocks instead of just allocating all you need at the beginning of the function.