I am trying get a handle on C as I work my way thru Jim Trevor's "Cyclone: A safe dialect of C" for a PL class. Trevor and his co-authors are trying to make a safe version of C, so they eliminate uninitialized pointers in their language. Googling around a bit on uninitialized pointers, it seems like un-initialized pointers point to random locations in memory. It seems like this alone makes them unsafe. If you reference an un-itilialized pointer, you jump to an unsafe part of memory. Period. But the way Trevor talks about them seems to imply that it is more complex. He cites the following code, and explains that when the function FrmGetObjectIndex dereferences f, it isn’t accessing a valid pointer, but rather an unpredictable address — whatever was on the stack when the space for f was allocated.
What does Trevor mean by "whatever was on the stack when the space for f was allocated"? Are "un-initialized" pointers initialized to random locations in memory by default? Or does their "random" behavior have to do with the memory allocated for these pointers getting filled with strange values (that are then referenced) because of unexpected behavior on the stack.
Form *f;
switch (event->eType) {
case frmOpenEvent:
f = FrmGetActiveForm(); ...
case ctlSelectEvent:
i = FrmGetObjectIndex(f, field); ...
}
What does Trevor mean by "whatever was on the stack when the space for f was allocated"?
He means that in most assembly languages, separate instructions are used to reserve space on the stack and to write an initial value inside the newly reserved space. If a C program uses an uninitialized variable, the program will typically at run-time execute the instruction that reserves stack space but no instruction that sets it. When the pointer is used, it will literally contain the bit pattern that was on the stack before space was reserved. In the good cases this will be an invalid address. In the bad cases this will happen to be a valid address, and the effects will be unpredictable.
This is only a typical behavior. From the theoretical point of view, using an indeterminate value is undefined behavior. Much stranger things than simply accessing an invalid address or a valid one can happen (examples with uninitialized data (not addresses) used accidentally or purposely).
Here is the sort of dangers that a restricted subset of C such as Cyclone aims to prevent:
int a, *p;
int main(int c, char **v){
int l, *lp, i;
if (c & 1)
a = l + 1; // danger
if (c & 2)
*lp = 3; // danger
if (c & 4)
{
p = &a;
for (i=0; i<=1; i++)
{
int block_local;
*p = 4; // danger
p = &block_local;
}
}
}
In the last dangerous line, in practice, it is most likely that 4 will be written to variable block_local, but in reality, at the second iteration, p is indeterminate, the program is not supposed to access *p, and it is undefined behavior when it does.
On modern OS's the danger is a core dump. On earlier systems without memory management and possibly memory mapped i/o to external HW the dangers are of completely different magnitude.
Related
I posted a question about some pointer issues I've been having earlier in this question:
C int pointer segmentation fault several scenarios, can't explain behaviour
From some of the comments, I've been led to believe that the following:
#include <stdlib.h>
#include <stdio.h>
int main(){
int *p;
*p = 1;
printf("%d\n", *p);
return 0;
}
is undefined behaviour. Is this true? I do this all the time, and I've even seen it in my C course.
However, when I do
#include <stdlib.h>
#include <stdio.h>
int main(){
int *p=NULL;
*p = 1;
printf("%d\n", *p);
return 0;
}
I get a seg fault right before printing the contents of p (after the line *p=1;). Does this mean I should have always been mallocing any time I actually assign a value for a pointer to point to?
If that's the case, then why does char *string = "this is a string" always work?
I'm quite confused, please help!
This:
int *p;
*p = 1;
Is undefined behavior because p isn't pointing anywhere. It is uninitialized. So when you attempt to dereference p you're essentially writing to a random address.
What undefined behavior means is that there is no guarantee what the program will do. It might crash, it might output strange results, or it may appear to work properly.
This is also undefined behaivor:
int *p=NULL;
*p = 1;
Because you're attempting to dereference a NULL pointer.
This works:
char *string = "this is a string" ;
Because you're initializing string with the address of a string constant. It's not the same as the other two cases. It's actually the same as this:
char *string;
string = "this is a string";
Note that here string isn't being dereferenced. The pointer variable itself is being assigned a value.
Yes, doing int *p; *p = 1; is undefined behavior. You are dereferencing an uninitialized pointer (accessing the memory to which it points). If it works, it is only because the garbage in p happened to be the address of some region of memory which is writable, and whose contents weren't critical enough to cause an immediate crash when you overwrote them. (But you still might have corrupted some important program data causing problems you won't notice until later...)
An example as blatant as this should trigger a compiler warning. If it doesn't, figure out how to adjust your compiler options so it does. (On gcc, try -Wall -O).
Pointers have to point to valid memory before they can be dereferenced. That could be memory allocated by malloc, or the address of an existing valid object (p = &x;).
char *string = "this is a string"; is perfectly fine because this pointer is not uninitialized; you initialized it! (The * in char *string is part of its declaration; you aren't dereferencing it.) Specifically, you initialized it with the address of some memory which you asked the compiler to reserve and fill in with the characters this is a string\0. Having done that, you can safely dereference that pointer (though only to read, since it is undefined behavior to write to a string literal).
is undefined behaviour. Is this true?
Sure is. It just looks like it's working on your system with what you've tried, but you're performing an invalid write. The version where you set p to NULL first is segfaulting because of the invalid write, but it's still technically undefined behavior.
You can only write to memory that's been allocated. If you don't need the pointer, the easiest solution is to just use a regular int.
int p = 1;
In general, avoid pointers when you can, since automatic variables are much easier to work with.
Your char* example works because of the way strings work in C--there's a block of memory with the sequence "this is a string\0" somewhere in memory, and your pointer is pointing at that. This would be read-only memory though, and trying to change it (i.e., string[0] = 'T';) is undefined behavior.
With the line
char *string = "this is a string";
you are making the pointer string point to a place in read-only memory that contains the string "this is a string". The compiler/linker will ensure that this string will be placed in the proper location for you and that the pointer string will be pointing to the correct location. Therefore, it is guaranteed that the pointer string is pointing to a valid memory location without any further action on your part.
However, in the code
int *p;
*p = 1;
p is uninitialized, which means it is not pointing to a valid memory location. Dereferencing p will therefore result in undefined behavior.
It is not necessary to always use malloc to make p point to a valid memory location. It is one possible way, but there are many other possible ways, for example the following:
int i;
int *p;
p = &i;
Now p is also pointing to a valid memory location and can be safely dereferenced.
Consider the code:
#include <stdio.h>
int main(void)
{
int i=1, j=2;
int *p;
... some code goes here
*p = 3;
printf("%d %d\n", i, j);
}
Would the statement *p = 2; write to i, j, or neither? It would write to i or j if p points to that object, but not if p points somewhere else. If the ... portion of the code doesn't do anything with p, then p might happen point to i, or j, or something within the stdout object, or anything at all. If it happens to point to i or j, then the write *p = 3; might affect that object without any side effects, but if it points to information within stdout that controls where output goes, it might cause the following printf to behave in unpredictable fashion. In a typical implementation, p might point anywhere, and there will be so many things to which p might point that it would be impossible to predict all of the possible effects of writing to them.
Note that the Standard classifies many actions as "Undefined Behavior" with the intention that many or even most implementations will extend the semantics of the language by documenting their behavior. Most implementations, for example, extend the meaning of the << operator to allow it to be used to multiply negative numbers by power of two. Even on implementations that extend the language to specify that an assignment like *p = 3; will always perform a word-sized write of the value 3 to the indicated address, with whatever consequence results, there would be relatively few platforms(*) where it would be possible to fully characterize all possible effects of that action in cases where nothing is known about the value of p. In cases where pointers are read rather than written, some systems may be able to offer useful behavioral guarantees about the effect of arbitrary stray reads, but not all(**).
(*) Some freestanding platforms which keep code in read-only storage may be able to uphold some behavioral guarantees even if code writes to arbitrary pointer addresses. Such behavioral guarantees may be useful in systems whose state might be corrupted by electrical interference, but even when targeting such systems writing to a stray pointer would never be useful.
(**) On many platforms, stray reads will either yield a meaningless value without side effects or force an abnormal program termination, but on an Apple II which a Disk II card in the customary slot-6 location, if code reads from address 0xC0EF within a second of performing a disk access, the drive head to start overwriting whatever happens to be on the last track accessed. This is by design (software that needs to write to the disk does so by accessing address 0xC0EF, and having hardware respond to both reads and writes required one less logic gate--and thus one less chip--than would be required for hardware that only responded to writes) but does mean that code must be careful not to perform any stray reads.
I was reading through some source code and found a functionality that basically allows you to use an array as a linked list? The code works as follows:
#include <stdio.h>
int
main (void)
{
int *s;
for (int i = 0; i < 10; i++)
{
s[i] = i;
}
for (int i = 0; i < 10; i++)
{
printf ("%d\n", s[i]);
}
return 0;
}
I understand that s points to the beginning of an array in this case, but the size of the array was never defined. Why does this work and what are the limitations of it? Memory corruption, etc.
Why does this work
It does not, it appears to work (which is actually bad luck).
and what are the limitations of it? Memory corruption, etc.
Undefined behavior.
Keep in mind: In your program whatever memory location you try to use, it must be defined. Either you have to make use of compile-time allocation (scalar variable definitions, for example), or, for pointer types, you need to either make them point to some valid memory (address of a previously defined variable) or, allocate memory at run-time (using allocator functions). Using any arbitrary memory location, which is indeterminate, is invalid and will cause UB.
I understand that s points to the beginning of an array in this case
No the pointer has automatic storage duration and was not initialized
int *s;
So it has an indeterminate value and points nowhere.
but the size of the array was never defined
There is neither array declared or defined in the program.
Why does this work and what are the limitations of it?
It works by chance. That is it produced the expected result when you run it. But actually the program has undefined behavior.
As I have pointed out first on the comments, what you are doing does not work, it seems to work, but it is in fact undefined behaviour.
In computer programming, undefined behavior (UB) is the result of
executing a program whose behavior is prescribed to be unpredictable,
in the language specification to which the computer code adheres.
Hence, it might "work" sometimes, and sometimes not. Consequently, one should never rely on such behaviour.
If it would be that easy to allocate a dynamic array in C what would one use malloc?! Try it out with a bigger value than 10 to increase the likelihood of leading to a segmentation fault.
Look into the SO Thread to see the how to properly allocation and array in C.
This question already has answers here:
What's the point of malloc(0)?
(17 answers)
Closed 4 years ago.
I know that malloc(size_t size) allocates size bytes and returns a pointer to the allocated memory. .
So how come when I allocate zero bytes to the integer pointer p, I am still able to initialize it?
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *p = malloc(0);
*p = 10;
printf("Pointer address is: %p\n", p);
printf("The value of the pointer: %d\n", *p);
return 0;
}
Here is my program output, I was expecting a segmentation fault.
Pointer address is: 0x1ebd260
The value of the pointer: 10
The behavior of malloc(0) is implementation defined, it will either return a pointer or NULL. As per C Standard, you should not use the pointer returned by malloc when requested zero size space1).
Dereferencing the pointer returned by malloc(0) is undefined behavior which includes the program may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.
1) From C Standard#7.22.3p1 [emphasis added]:
1 The order and contiguity of storage allocated by successive calls to the aligned_alloc, calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object. The pointer returned points to the start (lowest byte address) of the allocated space. If the space cannot be allocated, a null pointer is returned. If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.
When you call malloc(0) and write to the returned buffer, you invoke undefined behavior. That means you can't predict how the program will behave. It might crash, it might output strange results, or (as in this case) it may appear to work properly.
Just because the program could crash doesn't mean it will.
In general, C does not prevent you from doing incorrect things.
After int *p = malloc(0);, p has some value. It might be a null pointer, or it might point to one or more bytes of memory. In either case, you should not use it.1 But the C language does not stop you from doing so.
When you execute *p = 10;, the compiler may have generated code to write 10 to the place where p points. p may be pointing at actual writable memory, so the store instruction may execute without failing. And then you have written 10 to a place in memory where you should not. At this point, the C standard no longer specifies what the behavior of your program is—by writing to an inappropriate place, you have broken the model of how C works.
It is also possible your compiler recognizes that *p = 10; is incorrect code in this situation and generates something other than the write to memory described above. A good compiler might give you a warning message for this code, but the compiler is not obligated to do this by the C standard, and it can allow your program to break in other ways.
Footnote
1 If malloc returns a null pointer, you should not write to *p because it is not pointing to an object. If it returns something else, you should not write to *p because C 2018 7.22.3.1 says, for this of malloc(0), “the returned pointer shall not be used to access an object.”
I think it is good idea to look at segmentation fault definition https://en.wikipedia.org/wiki/Segmentation_fault
The following are some typical causes of a segmentation fault:
Attempting to access a nonexistent memory address (outside process's address space)
Attempting to access memory the program does not have rights to (such as kernel structures in process context)
Attempting to write read-only memory (such as code segment)
So in general, access to any address in program's data segment will not lead to seg fault. This approach makes sence as C doesn't have any framework with memory management in it (like C# or Java). So as malloc in this example returns some address (not NULL) it returns it from program's data segment which can be accessed by program.
However if program is complex, such action (*p = 10) may overwrite data belonging to some other variable or object (or even pointer!) and lead to undefined behaviour (including seg fault).
But please take in account, that as described in other answers such program is not best practice and you shouldn't use such approach in production.
This is a source code of malloc() (maybe have a litter difference between kernel versions but concept still like this), It can answer your question:
static void *malloc(int size)
{
void *p;
if (size < 0)
error("Malloc error");
if (!malloc_ptr)
malloc_ptr = free_mem_ptr;
malloc_ptr = (malloc_ptr + 3) & ~3; /* Align */
p = (void *)malloc_ptr;
malloc_ptr += size;
if (free_mem_end_ptr && malloc_ptr >= free_mem_end_ptr)
error("Out of memory");
malloc_count++;
return p;
}
When you assigned size is 0, it already gave you a pointer to first address of memory
In C, can one stuff a -1 value (e.g. 0xFFFFFFFF) into a pointer, using an approach such as this one, and expect that such memory address is never allocated at runtime?
The idea is that the pointer value be used as a memory address, except if it has this "special" -1 value. The pointer should be considered memory address even if it is NULL (in which case, the object to which it points to has not yet been built).
I understand this may be platform dependent, but the program in question is expected to run in Linux, Windows and MacOSX.
The problem at hand is much larger than what is described here, so comments or answers which question this approach are not useful. I know it's a bit hacky, but the alternative is a major refactor :/
Thanks in advance.
It is GRAS (generally recognized as safe). No major OS will allocate memory that would collide with your chosen sentinel. However, there are a few pathological cases where it would be invalid to make this assumption. For instance, a pathological C++ compiler may choose to start the stack at 0xFFFFFFFF, without violating any constraints in the spec.
Within just the scope of sane OS's, it is nearly impossible to have 0xFFFFFFFF (or its 64-bit equivalent) to be a valid memory address. It cannot be a valid memory address of an array (C++ rules forbid it). It could technically be a valid index of a char of an object allocated at the end of space, but there's two things that prevent that.
Most OSs have some padding
Most OSs use high memory values as Kernel memory.
If you have an opportunity to use a global value as a sentinel, it is guaranteed to be safe.
char sentinel;
char* p = "Hello";
char* p2 = 0; // null pointer
char* p3 = &sentinel;
if (p3 == &sentinel)
cout << "p3 was a sentinel" << endl;
One way to define a sentinel value that no other valid address will coincide with is a static variable:
static t sentinel;
t *p = &sentinel;
If you are going to assume a flat address space and that all pointers have the same width, you can minimize the overhead by declaring sentinel of type char instead of t.
To answer your question about (t*)-1:
-1 has type int. I would recommend (t*)(uintptr_t)-1, which is more likely to be the last address even for a 64-bit flat address space.
it is not very clean, but it should work on all commonplace architectures because, as long as the compiler intends to compare pointers using the unsigned comparison assembly instruction (as it usually does), for any object a that the compiler could hope to place at the end of the address space, &a + 1 has to compare greater than &a. In practice, this prevents the last address to be used to store anything.
This is actually a much more concise, much more clear question than the one I had asked here before(for any who cares): C Language: Why does malloc() return a pointer, and not the value? (Sorry for those who initially think I'm spamming... I hope it's not construed as the same question since I think the way I phrased it there made it unintentionally misleading)
-> Basically what I'm trying to ask is: Why does a C programmer need a pointer to a dynamically-allocated variable/object? (whatever the difference is between variable/object...)
If a C programmer has the option of creating just 'int x' or just 'int *x' (both statically allocated), then why can't he also have the option to JUST initialize his dynamically-allocated variable/object as a variable (and NOT returning a pointer through malloc())?
*If there are some obscure ways to do what I explained above, then, well, why does malloc() seem the way that most textbooks go about dynamic-allocation?
Note: in the following, byte refers to sizeof(char)
Well, for one, malloc returns a void *. It simply can't return a value: that wouldn't be feasible with C's lack of generics. In C, the compiler must know the size of every object at compile time; since the size of the memory being allocated will not be known until run time, then a type that could represent any value must be returned. Since void * can represent any pointer, it is the best choice.
malloc also cannot initialize the block: it has no knowledge of what's being allocated. This is in contrast with C++'s operator new, which does both the allocation and the initialization, as well as being type safe (it still returns a pointer instead of a reference, probably for historical reasons).
Also, malloc allocates a block of memory of a specific size, then returns a pointer to that memory (that's what malloc stands for: memory allocation). You're getting a pointer because that's what you get: an unitialized block of raw memory. When you do, say, malloc(sizeof(int)), you're not creating a int, you're allocating sizeof(int) bytes and getting the address of those bytes. You can then decide to use that block as an int, but you could also technically use that as an array of sizeof(int) chars.
The various alternatives (calloc, realloc) work roughly the same way (calloc is easier to use when dealing with arrays, and zero-fills the data, while realloc is useful when you need to resize a block of memory).
Suppose you create an integer array in a function and want to return it. Said array is a local variable to the function. You can't return a pointer to a local variable.
However, if you use malloc, you create an object on the heap whose scope exceeds the function body. You can return a pointer to that. You just have to destroy it later or you will have a memory leak.
It's because objects allocated with malloc() don't have names, so the only way to reference that object in code is to use a pointer to it.
When you say int x;, that creates an object with the name x, and it is referenceable through that name. When I want to set x to 10, I can just use x = 10;.
I can also set a pointer variable to point to that object with int *p = &x;, and then I can alternatively set the value of x using *p = 10;. Note that this time we can talk about x without specifically naming it (beyond the point where we acquire the reference to it).
When I say malloc(sizeof(int)), that creates an object that has no name. I cannot directly set the value of that object by name, since it just doesn't have one. However, I can set it by using a pointer variable that points at it, since that method doesn't require naming the object: int *p = malloc(sizeof(int)); followed by *p = 10;.
You might now ask: "So, why can't I tell malloc to give the object a name?" - something like malloc(sizeof(int), "x"). The answer to this is twofold:
Firstly, C just doesn't allow variable names to be introduced at runtime. It's just a basic restriction of the language;
Secondly, given the first restriction the name would have to be fixed at compile-time: if this is the case, C already has syntax that does what you want: int x;.
You are thinking about things wrong. It is not that int x is statically allocated and malloc(sizeof(int)) is dynamic. Both are allocated dynamically. That is, they are both allocated at execution time. There is no space reserved for them at the time you compile. The size may be static in one case and dynamic in the other, but the allocation is always dynamic.
Rather, it is that int x allocates the memory on the stack and malloc(sizeof(int)) allocates the memory on the heap. Memory on the heap requires that you have a pointer in order to access it. Memory on the stack can be referenced directly or with a pointer. Usually you do it directly, but sometimes you want to iterate over it with pointer arithmetic or pass it to a function that needs a pointer to it.
Everything works using pointers. "int x" is just a convenience - someone, somewhere got tired of juggling memory addresses and that's how programming languages with human-readable variable names were born.
Dynamic allocation is... dynamic. You don't have to know how much space you are going to need when the program runs - before the program runs. You choose when to do it and when to undo it. It may fail. It's hard to handle all this using the simple syntax of static allocation.
C was designed with simplicity in mind and compiler simplicity is a part of this. That's why you're exposed to the quirks of the underlying implementations. All systems have storage for statically-sized, local, temporary variables (registers, stack); this is what static allocation uses. Most systems have storage for dynamic, custom-lifetime objects and system calls to manage them; this is what dynamic allocation uses and exposes.
There is a way to do what you're asking and it's called C++. There, "MyInt x = 42;" is a function call or two.
I think your question comes down to this:
If a C programmer has the option of creating just int x or just int *x (both statically allocated)
The first statement allocates memory for an integer. Depending upon the placement of the statement, it might allocate the memory on the stack of a currently executing function or it might allocate memory in the .data or .bss sections of the program (if it is a global variable or static variable, at either file scope or function scope).
The second statement allocates memory for a pointer to an integer -- it hasn't actually allocated memory for the integer itself. If you tried to assign a value using the pointer *x=1, you would either receive a very quick SIGSEGV segmentation violation or corrupt some random piece of memory. C doesn't pre-zero memory allocated on the stack:
$ cat stack.c
#include <stdio.h>
int main(int argc, char *argv[]) {
int i;
int j;
int k;
int *l;
int *m;
int *n;
printf("i: %d\n", i);
printf("j: %d\n", j);
printf("k: %d\n", k);
printf("l: %p\n", l);
printf("m: %p\n", m);
printf("n: %p\n", n);
return 0;
}
$ make stack
cc stack.c -o stack
$ ./stack
i: 0
j: 0
k: 32767
l: 0x400410
m: (nil)
n: 0x4005a0
l and n point to something in memory -- but those values are just garbage, and probably don't belong to the address space of the executable. If we store anything into those pointers, the program would probably die. It might corrupt unrelated structures, though, if they are mapped into the program's address space.
m at least is a NULL pointer -- if you tried to write to it, the program would certainly die on modern hardware.
None of those three pointers actually point to an integer yet. The memory for those integers doesn't exist. The memory for the pointers does exist -- and is initially filled with garbage values, in this case.
The Wikipedia article on L-values -- mostly too obtuse to fully recommend -- makes one point that represented a pretty significant hurdle for me when I was first learning C: In languages with assignable variables it becomes necessary to distinguish between the R-value (or contents) and the L-value (or location) of a variable.
For example, you can write:
int a;
a = 3;
This stores the integer value 3 into whatever memory was allocated to store the contents of variable a.
If you later write:
int b;
b = a;
This takes the value stored in the memory referenced by a and stores it into the memory location allocated for b.
The same operations with pointers might look like this:
int *ap;
ap=malloc(sizeof int);
*ap=3;
The first ap= assignment stores a memory location into the ap pointer. Now ap actually points at some memory. The second assignment, *ap=, stores a value into that memory location. It doesn't update the ap pointer at all; it reads the value stored in the variable named ap to find the memory location for the assignment.
When you later use the pointer, you can choose which of the two values associated with the pointer to use: either the actual contents of the pointer or the value pointed to by the pointer:
int *bp;
bp = ap; /* bp points to the same memory cell as ap */
int *bp;
bp = malloc(sizeof int);
*bp = *ap; /* bp points to new memory and we copy
the value pointed to by ap into the
memory pointed to by bp */
I found assembly far easier than C for years because I found the difference between foo = malloc(); and *foo = value; confusing. I hope I found what was confusing you and seriously hope I didn't make it worse.
Perhaps you misunderstand the difference between declaring 'int x' and 'int *x'. The first allocates storage for an int value; the second doesn't - it just allocates storage for the pointer.
If you were to "dynamically allocate" a variable, there would be no point in the dynamic allocation anyway (unless you then took its address, which would of course yield a pointer) - you may as well declare it statically. Think about how the code would look - why would you bother with:
int x = malloc(sizeof(int)); *x = 0;
When you can just do:
int x = 0;