I am trying to get my feet into C, and wrote this program that displays a kb of my RAM in a random location. Here is the code, and it works fine:
#include <stdio.h>
int main(){
char *mem;
for(int i =0; i < 1024; i++){
mem++;
printf("%c", *mem);
}
return 0;
}
After that, I did the following change in my code, and I get segfaults every time I run my program:
#include <stdio.h>
// Just added this signature
int main(int argc, char *argv[]){
char *mem;
for(int i =0; i < 1024; i++){
mem++;
printf("%c", *mem);
}
return 0;
}
My spider senses tell me that the segfaults I get are random, and should also be caused in the first example, but running the different programs again and again makes it look like predictable behaviour.
$ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/c++/4.2.1
Apple LLVM version 7.3.0 (clang-703.0.31)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Both your snippets invoke undefined behavior as you try to
Go out of bound (mem++;, with no allocation)
use uninitialized values (accessing *mem )
with the current version.
Remember, pointers do not magically inherit (or acquire) memory, you need to make a pointer point to something valid, in general.
The value of mem is undefined (not initialized), but not random. If before main is called, other C runtime functions, are called, then the slot of stack used by mem may have a valid pointer within it. Adding parameters to main changes which slot is used and changes behaviour. This can mean the code doesn't crash, although it is not correct.
You need to initialize mem. I guess you're trying to just read random memory, but that isn't allowed. For example, you may be trying to read memory that's used by a different process, or you may be trying to read some address that doesn't even exist in your computer.
By changing the signature for main, you've changed what random junk value is in mem to start with. The way it probably works is that mem is taking a random value from some register. When you modified the function signature, argc and argv are using those registers instead. Therefor mem is getting a different junk register value of a junk stack value. In any case, you shouldn't try to follow a junk pointer.
Just because it works in one example, only means you got lucky. You still should not do it. It's very likely it wouldn't work if any little thing was changed.
You never initialize mem, so its contents are undefined. When you attempt to either increment it with ++ or dereference the pointer, you get undefined behavior.
One of the things that can happen with undefined behavior is that a program may appear to work normally, and making a seemingly unrelated change will cause a crash.
Related
I was reading through some source code and found a functionality that basically allows you to use an array as a linked list? The code works as follows:
#include <stdio.h>
int
main (void)
{
int *s;
for (int i = 0; i < 10; i++)
{
s[i] = i;
}
for (int i = 0; i < 10; i++)
{
printf ("%d\n", s[i]);
}
return 0;
}
I understand that s points to the beginning of an array in this case, but the size of the array was never defined. Why does this work and what are the limitations of it? Memory corruption, etc.
Why does this work
It does not, it appears to work (which is actually bad luck).
and what are the limitations of it? Memory corruption, etc.
Undefined behavior.
Keep in mind: In your program whatever memory location you try to use, it must be defined. Either you have to make use of compile-time allocation (scalar variable definitions, for example), or, for pointer types, you need to either make them point to some valid memory (address of a previously defined variable) or, allocate memory at run-time (using allocator functions). Using any arbitrary memory location, which is indeterminate, is invalid and will cause UB.
I understand that s points to the beginning of an array in this case
No the pointer has automatic storage duration and was not initialized
int *s;
So it has an indeterminate value and points nowhere.
but the size of the array was never defined
There is neither array declared or defined in the program.
Why does this work and what are the limitations of it?
It works by chance. That is it produced the expected result when you run it. But actually the program has undefined behavior.
As I have pointed out first on the comments, what you are doing does not work, it seems to work, but it is in fact undefined behaviour.
In computer programming, undefined behavior (UB) is the result of
executing a program whose behavior is prescribed to be unpredictable,
in the language specification to which the computer code adheres.
Hence, it might "work" sometimes, and sometimes not. Consequently, one should never rely on such behaviour.
If it would be that easy to allocate a dynamic array in C what would one use malloc?! Try it out with a bigger value than 10 to increase the likelihood of leading to a segmentation fault.
Look into the SO Thread to see the how to properly allocation and array in C.
Having recently switched to c, I've been told a thousand ways to Sunday that referencing a value that hasn't been initialized isn't good practice, and leads to unexpected behavior. Specifically, (because my previous language initializes integers as 0) I was told that integers might not be equal to zero when uninitialized. So I decided to put that to the test.
I wrote the following piece of code to test this claim:
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include <assert.h>
int main(){
size_t counter = 0;
size_t testnum = 2000; //The number of ints to allocate and test.
for(int i = 0; i < testnum; i++){
int* temp = malloc(sizeof(int));
assert(temp != NULL); //Just in case there's no space.
if(*temp == 0) counter++;
}
printf(" %d",counter);
return 0;
}
I compiled it like so (in case it matters):
gcc -std=c99 -pedantic name-of-file.c
Based on what my instructors had said, I expected temp to point to a random integer, and that the counter would not be incremented very often. However, my results blow this assumption out of the water:
testnum: || code returns:
2 2
20 20
200 200
2000 2000
20000 20000
200000 200000
2000000 2000000
... ...
The results go on for a couple more powers of 10 (*2), but you get the point.
I then tested a similar version of the above code, but I initialized an integer array, set every even index to plus 1 of its previous value (which was uninitialized), freed the array, and then performed the code above, testing the same amount of integers as the size of the array (i.e. testnum). These results are much more interesting:
testnum: || code returns:
2 2
20 20
200 175
2000 1750
20000 17500
200000 200000
2000000 2000000
... ...
Based on this, it's reasonable to conclude that c reuses freed memory (obviously), and sets some of those new integer pointers to point to addresses which contain the previously incremented integers. My question is why all of my integer pointers in the first test consistently point to 0. Shouldn't they point to whatever empty spaces on the heap that my computer has offered the program, which could (and should, at some point) contain non-zero values?
In other words, why does it seem like all of the new heap space that my c program has access to has been wiped to all 0s?
As you already know, you are invoking undefined behavior, so all bets are off. To explain the particular results you are observing ("why is uninitialized memory that I haven't written to all zeros?"), you first have to understand how malloc works.
First of all, malloc does not just directly ask the system for a page whenever you call it. It has an internal "cache" from which it can hand you memory. Let's say you call malloc(16) twice. The first time you call malloc(16), it will scan the cache, see that it's empty, and request a fresh page (4KB on most systems) from the OS. It then splits this page into two chunks, gives you the smaller chunk, and saves the other chunk in its cache. The second time you call malloc(16), it will see that it has a large enough chunk in its cache, and allocate memory by splitting that chunk again.
freeing memory simply returns it to the cache. There, it may (or may not be) be merged with other chunks to form a bigger chunk, and is then used for other allocations. Depending on the details of your allocator, it may also choose to return free pages to the OS if possible.
Now the second piece of the puzzle -- any fresh pages you obtain from the OS are filled with 0s. Why? Imagine it simply handed you an unused page that was previously used by some other process that has now terminated. Now you have a security problem, because by scanning that "uninitialized memory", your process could potentially find sensitive data such as passwords and private keys that were used by the previous process. Note that there is no guarantee by the C language that this happens (it may be guaranteed by the OS, but the C specification doesn't care). It's possible that the OS filled the page with random data, or didn't clear it at all (especially common on embedded devices).
Now you should be able to explain the behavior you're observing. The first time, you are obtaining fresh pages from the OS, so they are empty (again, this is an implementation detail of your OS, not the C language). However, if you malloc, free, then malloc again, there is a chance that you are getting back the same memory that was in the cache. This cached memory is not wiped, since the only process that could have written to it was your own. Hence, you just get whatever data was previously there.
Note: this explains the behavior for your particular malloc implementation. It doesn't generalize to all malloc implementations.
First off, you need to understand, that C is a language that is described in a standard and implemented by several compilers (gcc, clang, icc, ...). In several cases, the standard mentions that certain expressions or operations result in undefined behavior.
What is important to understand is that this means you have no guarantees on what the behavior will be. In fact any compiler/implementation is basically free to do whatever it wants!
In your example, this means you cannot make any assumptions of when the uninitialized memory will contain. So assuming it will be random or contain elements of a previously freed object are just as wrong as assuming that it is zero, because any of that could happen at any time.
Many compilers (or OS's) will consistently do the same thing (such as the 0s you observer), but that is also not guaranteed.
(To maybe see different behaviors, try using a different compiler or different flags.)
Undefined behavior does not mean "random behavior" nor does it mean "the program will crash." Undefined behavior means "the compiler is allowed to assume that this never happens," and "if this does happen, the program could do anything." Anything includes doing something boring and predictable.
Also, the implementation is allowed to define any instance of undefined behavior. For instance, ISO C never mentions the header unistd.h, so #include <unistd.h> has undefined behavior, but on an implementation conforming to POSIX, it has well-defined and documented behavior.
The program you wrote is probably observing uninitialized malloced memory to be zero because, nowadays, the system primitives for allocating memory (sbrk and mmap on Unix, VirtualAlloc on Windows) always zero out the memory before returning it. That's documented behavior for the primitives, but it is not documented behavior for malloc, so you can only rely on it if you call the primitives directly. (Note that only the malloc implementation is allowed to call sbrk.)
A better demonstration is something like this:
#include <stdio.h>
#include <stdlib.h>
int
main(void)
{
{
int *x = malloc(sizeof(int));
*x = 0xDEADBEEF;
free(x);
}
{
int *y = malloc(sizeof(int));
printf("%08X\n", *y);
}
return 0;
}
which has pretty good odds of printing "DEADBEEF" (but is allowed to print 00000000, or 5E5E5E5E, or make demons fly out of your nose).
Another better demonstration would be any program that makes a control-flow decision based on the value of an uninitialized variable, e.g.
int foo(int x)
{
int y;
if (y == 5)
return x;
return 0;
}
Current versions of gcc and clang will generate code that always returns 0, but the current version of ICC will generate code that returns either 0 or the value of x, depending on whether register EDX is equal to 5 when the function is called. Both possibilities are correct, and so generating code that always returns x, and so is generating code that makes demons fly out of your nose.
useless deliberations, wrong assumptions, wrong test. In your test every time you malloc sizeof int of the fresh memory. To see the that UB you wanted to see you should put something in that allocated memory and then free it. Otherwise you do not reuse it, you just leak it. Most of the OS-es clear all the memory allocated to the program before executing it for the security reasons (so when you start the program everything was zeroed or initialised to the static values).
Change your program to:
int main(){
size_t counter = 0;
size_t testnum = 2000; //The number of ints to allocate and test.
for(int i = 0; i < testnum; i++){
int* temp = malloc(sizeof(int));
assert(temp != NULL); //Just in case there's no space.
if(*temp == 0) counter++;
*temp = rand();
free(temp);
}
printf(" %d",counter);
return 0;
}
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *a;
a = (int *)malloc(100*sizeof(int));
int i=0;
for (i=0;i<100;i++)
{
a[i] = i+1;
printf("a[%d] = %d \n " , i,a[i]);
}
a = (int*)realloc(a,75*sizeof(int));
for (i=0;i<100;i++)
{
printf("a[%d] = %d \n " , i,a[i]);
}
free(a);
return 0;
}
In this program I expected the program to give me a segmentation fault because im trying to access an element of an array which is freed using realloc() . But then the output is pretty much the same except for a few final elements !
So my doubt is whether the memory is actually getting freed ? What exactly is happening ?
The way realloc works is that it guarantees that a[0]..a[74] will have the same values after the realloc as they did before it.
However, the moment you try to access a[75] after the realloc, you have undefined behaviour. This means that the program is free to behave in any way it pleases, including segfaulting, printing out the original values, printing out some random values, not printing anything at all, launching a nuclear strike, etc. There is no requirement for it to segfault.
So my doubt is whether the memory is actually getting freed?
There is absolutely no reason to think that realloc is not doing its job here.
What exactly is happening?
Most likely, the memory is getting freed by shrinking the original memory block and not wiping out the now unused final 25 array elements. As a result, the undefined behaviour manifests itself my printing out the original values. It is worth noting that even the slightest changes to the code, the compiler, the runtime library, the OS etc could make the undefined behaviour manifest itself differently.
You may get a segmentation fault, but you may not. The behaviour is undefined, which means anything can happen, but I'll attempt to explain what you might be experiencing.
There's a mapping between your virtual address space and physical pages, and that mapping is usually in pages of 4096 bytes at least (well, there's virtual memory also, but lets ignore that for the moment).
You get a segmentation fault if you attempt to address virtual address space that doesn't map to a physical page. So your call to realloc may not have resulted in a physical page being returned to the system, so it's still mapped to you program and can be used. However a following call to malloc could use that space, or it could be reclaimed by the system at any time. In the former case you'd possibly overwrite another variable, in the latter case you'll segfault.
Accessing an array beyond its bounds is undefined behaviour. You might encounter a runtime error. Or you might not. The memory manager may well have decided to re-use the original block of memory when you re-sized. But there's no guarantee of that. Undefined behaviour means that you cannot reason about or predict what will happen. There's no grounds for you to expect anything to happen.
Simply put, don't access beyond the end of the array.
Some other points:
The correct main declaration here is int main(void).
Casting the value returned by malloc is not needed and can mask errors. Don't do it.
Always store the return value of realloc into a separate variable so that you can detect NULL being returned and so avoid losing and leaking the original block.
I'm learning C and trying to build an dynamic array. I found a great tutorial on this but I don't get it all the way. The code I have now is
typedef struct{
int size;
int capacity;
char *data;
}Brry;
void brry_init(Brry *brry){
brry->size = 0;
brry->capacity = 2;
brry->data = (char *)calloc(brry->capacity, sizeof(char));
}
void brry_insert(Brry *brry, char value){
brry->data[brry->size++] = value; //so do check here if I have enough memory, but checking something out
}
int main(void){
Brry brry;
brry_init(&brry);
for (int i = 0; i < 3; i++) {
brry_insert(&brry, 'a');
}
printf("%c\n", brry.data[2]);
return 0;
}
In my main function I add 3 element to the array, but it only allocated for 2. But when I print it it works just fine? I expected some strange value to be printed. Why is this or am I doing something wrong?
You are writing into a buffer you didn't allocate enough memory for. That it works is not guaranteed.
What you're trying now is to read from some junk value in memory, who knows, which sometimes leads to a segmentation fault and other times you are lucky and get some junk value, and it doesn't segfault.
Writing into junk memory will invoke undefined behavior, so better watch it.
If you do get errors it will almost always be a segfault, short for segmentation fault.
Read up on it here.
The technical for what you're doing by reading past the bounds of the array is called derefencing a pointer. You might also want to read more about that here.
Yes, you are indeed writing to the third element of a two element array. This means your program will exhibit undefined behavior and you have no guarantee of what is going to happen. In your case you got lucky and the program "worked", but you might not always be so lucky.
Trying to read/write past the end of the array results in undefined behaviour. Exactly what happens depends on several factors which you cannot predict or control. Sometimes, it will seem to read and/or write successfully without complaining. Other times, it may fail horribly and effectively crash your program.
The critical thing is that you should never try to use or rely on undefined behaviour. It's unfortunately a common rookie mistake to think that it will always work because one test happened to succeed. That's definitely not the case, and is a recipe for disaster sooner or later.
The code below is said to give a segmentation violation:
#include <stdio.h>
#include <string.h>
void function(char *str) {
char buffer[16];
strcpy(buffer,str);
}
int main() {
char large_string[256];
int i;
for( i = 0; i < 255; i++)
large_string[i] = 'A';
function(large_string);
return 1;
}
It's compiled and run like this:
gcc -Wall -Wextra hw.cpp && a.exe
But there is nothing output.
NOTE
The above code indeed overwrites the ret address and so on if you really understand what's going underneath.
The ret address will be 0x41414141 to be specific.
Important
This requires profound knowledge of stack
You're just getting lucky. There's no reason that code has to generate a segmentation fault (or any other kind of error). It's still probably a bad idea, though. You can probably get it to fail by increasing the size of large_string.
Probably in your implementation buffer is immediately below large_string on the stack. So when the call to strcpy overflows buffer, it's just writing most of the way into large_string without doing any particular damage. It will write at least 255 bytes, but whether it writes more depends what's above large_string (and the uninitialised value of the last byte of large_string). It seems to have stopped before doing any damage or segfaulting.
By fluke, the return address of the call to function isn't being trashed. Either it's below buffer on the stack or it's in a register, or maybe the function is inlined, I can't remember what no optimisation does. If you can't be bothered to check the disassembly, I can't either ;-). So you're returning and exiting without problems.
Whoever said that code would give a segfault probably isn't reliable. It results in undefined behaviour. On this occasion, the behaviour was to output nothing and exit.
[Edit: I checked on my compiler (GCC on cygwin), and for this code it is using the standard x86 calling convention and entry/exit code. And it does segfault.]
You're compiling a .cpp (c++) program by invoking gcc (instead of g++)... not sure if this is the cause, but on a linux system (it appears your running on windows due to the default .exe output) it throws the following error when trying to compile as you have stated:
/tmp/ccSZCCBR.o:(.eh_frame+0x12): undefined reference to `__gxx_personality_v0'
collect2: ld returned 1 exit status
Its UB ( undefined behavior).
Strcpy might have copied more bytes into memory pointed by buffer and it might not cause problem at that moment.
It's undefined behavior, which means that anything can happen. The program can even appear to work correctly.
It seem that you just happen to not overwrite any parts of memory that are still needed by the rest of the (short) program (or are out of the programs address space/write protected/...), so nothing special happens. At least nothing that would lead to any output.
There's a zero byte on the stack somewhere that stops the strcpy() and there's enough room on the stack not to hit protected page. Try printing out strlen(buffer) in that function. In any case the result is undefined behavior.
Get into habit of using strlcpy(3) family of functions.
You can test this in other ways:
#include <stdlib.h>
int main() {
int *a=(int *)malloc(10*sizeof(int));
int i;
for (i=0;i<1000000; i++) a[i] = i;
return 0;
}
In my machine, this causes SIGSEGV only at around i = 37000! (tested by inspecting the core with gdb).
To guard against these problems, test your programs using a malloc debugger... and use lots of mallocs, since there are no memory debugging libraries that I know of that can look into static memory. Example: Electric Fence
gcc -g -Wall docore.c -o c -lefence
And now the SIGSEGV is triggered as soon as i=10, as would be expected.
As everyone says, your program has undefined behaviour. In fact your program has more bugs than you thought it did, but after it's already undefined it doesn't get any further undefined.
Here's my guess about why there was no output. You didn't completely disable optimization. The compiler saw that the code in function() doesn't have any defined effect on the rest of the program. The compiler optimized out the call to function().
Odds are that the long string is, in fact, terminated by the zero byte in i. Assuming that the variables in main are laid out in the order they are declared -- which isn't required by anything in the language spec that I know of but seems likely in practice -- then large_string would be first in memory, followed by i. The loop sets i to 0 and counts up to 255. Whether i is stored big-endian or little-endian, either way it has a zero byte in it. So in traversing large_string, at either byte 256 or 257 you'll hit a null byte.
Beyond that, I'd have to study the generated code to figure out why this didn't blow. As you seem to indicate, I'd expect that the copy to buffer would overwrite the return address from the strcpy, so when it tried to return you'd be going into deep space some where and would quickly blow up on something.
But as others say, "undefined" means "unpredictable".
There may be anything in your 'char buffer[16]', including \0. strcpy copies till it finds first \0 - thus not going above your boundary of 16 characters.