Segfault when dereferencing a custom mem address (C) - c

I want to declare a pointer, have it hold a custom address and then assign a value to it:
void main()
{
char *ptr;
ptr = (char *)0x123123; //the assignment works perfectly with a cast
printf("%p\n", ptr); //and the pointer indeed holds the address it's supposed to
*ptr = 'a'; //but this breaks
puts("2");
}
Initially I thought the reason is because I'm trying to dereference uninitialized memory. But I doubt actually that this is the case, since this some_type *some_ptr = &some_variable; works flawlessly, so the deal must be the address I assign it to.
Then I thought, in the same way 3 or 'a' or "alpine" are constants, (char *) 0x123123 must be a constant too. And const-s can't be edited in C, but that still can't be it, because an attempt to change a const value will not compile.
3rd assumption would be that such an address must be unavailable, but this doesn't make sense either, because line 4 works always, no matter the address I give or the type of the pointer.

3rd assumption would be that such an address must be unavailable,
That is correct: on modern OSes (which all have memory protection) you can't write to arbitrary memory address.
It used to be possible to access any memory on OSes that didn't utilize virtual memory (such as MS-DOS), but allowing that is generally a very bad idea -- it allowed random program to corrupt OS state, and required very frequent reboots.
but this doesn't make sense either, because line 4 works always, no matter the address I give or the type of the pointer.
You confuse two distinct operations: printing an address (allowed no matter what that address is) and dereferencing an address, i.e. reading or modifying the value stored at the address (only allowed for valid addresses).
The distinction is similar to "can you print an address?" (e.g. "123 Main Street, SomeTown, SomeCountry"), and "can you enter a house at that address?" (not possible for above address because there is no "SomeCountry" on Earth). Even if the address is valid, e.g. "1600 Pennsylvania Ave NW, Washington, DC 20500", you may still not be allowed to enter it.

The OP clarified elsewhere, that this is actually an XY problem.
The X problem: reading/writing to arbitrary memory locations.
The Y problem: implementing a linked list that uses consecutive memory.
Of course, the answer to that is: one has to implement his complete own memory management system to get there.
As in: first, you use malloc() to acquire a large block of consecutive memory. Then you can use arbitrary pointers within that block of memory. But of course, your code has to track which addresses are already used. Or to correctly "free" up when list nodes get deleted.
The tricky part is about handling the corner cases, such as: what happens when your last "pointer" gets used up? Do you malloc() a larger area, and move all data in memory?
Finally: assume that you don't manage a block of memory, but a single array. ( linked lists implementations are often based on arrays, as that makes some things much easier )

writing to some arbitrary memory address is dangerous and not allowed by modern operating systems, better to create a memory blob and write to that.
e.g. using malloc :
ptr = malloc(32); // now you can write to this memory block and it perfectly legal
*ptr = 'a';

Related

Pointers, malloc and Compilation

I'm trying to figure out how pointers work with allocating memory to them and declaring them, although I kinda know how they work, I'm still getting confused and I'm not sure if it's because of my compiler or something.
I currently use CodeBlocks with GNU/GCC compiler as default, and this is the code I'm running:
#include <stdio.h>
int main()
{
int a = 2;
int *b = 5;
printf("%d\n", a);
printf("%d\n", b);
}
The problem is that both of these write out correct result, why would I need to use malloc if I can just write out *b = 5 and declare it like that, isn't the purpose of malloc to allocate memory to a pointer so you can declare it after?
Is it compilers fault that it allows this to compile or I'm just not getting the point of malloc?
The problem is that both of these write out correct result, why would I need to use malloc if I can just write out *b = 5 and declare it like that, isn't the purpose of malloc to allocate memory to a pointer so you can declare it after?
You can write pointers to existing memory in which you already know what data/code exists there. In your case you're just making a pointer point to 5.. And you have no idea what's at memory address 5.
I think you're slightly confused on what memory allocation is actually doing. It's not magically making memory appear for you to use, it's using something called the heap to make memory accessible aka read/write and depending on what permissions you give possibly executable. You allocate memory so you can use that memory region. If you don't allocate memory and just access a random memory region, you have no idea what will happen. Most likely it will just crash. But you could also be accessing/overwriting existing critical information to the process.
Let's assume calling malloc returns the address 0xCAFE. This means that address or region of memory is made accessible for us. But we know 0xCAFE already exists in the process memory, could we not just make a pointer point to it and use it? No, because you have no idea if that region is accessible or if it's already being used (allocated by a previous call to malloc) or it may be used in the future (allocated by a future call to malloc) in which you'll result in corrupt memory.
The fact that the result is what you expect doesn't mean the code is correct, your program is ill-formed, in int *b = 5, you are assigning the value of 5 to a pointer, this will be interpreted as a memory address, that's what pointers are used for, so you can use that memory address to access the data, and, for instance, pass it as a function argument so that the data can be manipulated.
If you want to just store an int you would use an int variable, so, whilst not illegal, it doesn't make much sense, and deferencing the pointer will invoke undefined behavior, so you can't really use it as a pointer, which is what it is.
You use malloc so that your program and that specific pointer can be given(assigned) by the system a workable memory address where you can store data for later use, (5 is almost certainly not that).
printf("%d\n", b) is also incorrect, the specifier to print a pointer value i.e. a memory address is %p, that is definitely undefined behavior, the correct expression would be:
printf("%p\n", (void*)b);
C gives the programmer leeway to do things that other programming languages don't allow, that is an advantage, but it can also be a problem, programs that compile and seem to run properly may be problematic. When an ill-formed program complies and runs its behavior falls in the category of undefined behavior.
This is defined in the standard and gives the compilers discretionary power to treat the code in any way it sees fit, that includes producing a seemimgly correct result, the problem is that this may work today, and crash tomorrow or vice-versa, it is completely unreliable.

Pointer alignment issue

I have the content of a file already loaded in memory and I want to assign the data from the file to a convenient set of structs, and I donĀ“t want to allocate new memory.
So I have the pointer of the memory where the data from the file starts, from there I work down this pointer assigning the values to different structs but then I reach a point where the program crashes.
//_pack_dynamic is the pointer to the data in memory
us *l_all_indexes = (us *) _pack_dynamic; //us is an unsigned short
printf("Index 0:%d", l_all_indexes[0]); //here is where the program crashes
_pack_dynamic += sizeof(us) * m_number_of_indexes;
The data, at least for the first element, is there, I can get it out like so:
us temp;
memcpy(&temp, _pack_dynamic, sizeof(us));
Any idea how I could extract all the indexes (m_number_of_indexes) from _pack_dynamic and assign them to l_all_indexes without allocating new memory?
Accessing _pack_dynamic as if it contained us object(s) has undefined behaviour unless it actually does contain such objects (this is a slight simplification, but a good rule of thumb. An array of char certainly cannot be interpreted as short).
The memcpy way into a proper us object is the only standard way to interpret memory as an object. Another approach for integers is to read char by char and shift-mask-or them together. This approach allows assuming a particular endianness instead of native.
A system dependent way that might work is to make sure that _pack_dynamic is aligned to the boundary required by us. But even then, standard gives you no guarantees about behaviour.
"Allocating" an automatic variable has hardly any runtime overhead. Allocating a few bytes for a short is usually insignificant.

reading a arbitary memory location making the program crash

I am trying to read the value at random memory location using the following c code
main()
{
int a,*b;
printf("enter the value of a");
scanf("%d",&a);
b=a;
printf("%d\n%d\n",a,*b);
getch();
}
But the program is crashing when some negative values or even when zero is entered in place of variable a through scanf.
What I am doing wrong? Does the pointers dont have negative values?
The thing is that as you are probably running on a modern, full service operating system and it provides a more complicated abstract machine than the one described in most intro to c programming books.
In particular, there are OS imposed restrictions on access to arbitrary addresses. You don't notice this when you look at addresses associated with standard variables because you do have permission to use the stack and the data segment, and the alloc family of functions takes care of making sure that you have permission to access the parts of the heap that they hand you.
So, what you are doing is accessing memory for which you do not have permission, which results in a failure called a "segmentation fault" on most OS, and that abruptly ends your program.
What can you do about it?
Allocate a big block on the heap, by calling char *p = malloc(1000000); and then find the starting and ending addresses with something like printf("start:\t%p\nend\t%p\n",(void*)p,(void*)(p+1000000)); and only entering numbers in that range.
Note that the %p specifier print specifier outputs in hexadecimal, so you probably want to enter address in the same base. The standard library function strtol will be helpful in that regard.
A more sophisticated approach would be to use your OS's API to request access permission for arbitrary address, but for some values the OS is likely to simply refuse.
I see some confusion here over just want a pointer is.
First, you ask the user for a value. This is fine. Then you assign that value as the location of the pointer b. This MAY be fine but likely not.
Think for a moment, what does *(-500) mean? What would *(0) mean?
In general you can never just take user input and use it without first checking it or manipulating it. This is one place where security breaches come from.
If you want to experiment with dereferencing memory, just hard code some values at first. Load the program up in a debugger and see what happens.
int c;
b = 500;
c = *b; // what happens?
b = 0;
c = *b; // what happens?
b = -100;
c = *b; // what happens?
Let me greatly oversimplify for you...
In almost all modern computers, with most operating systems, very little of the memory in the machine is directly addressable by your program. You can't take a pointer, point it at something, and try to read it. It will almost always fail.
There are generally three things that will go wrong:
The memory doesn't exist where you're pointing. Pointers can hold large range of values, and not all of them mean anything. It's like a house number in a postal address. Technically, you can put anything you want on the envelope. Only some are valid.
The memory exists, but isn't yours. The vast majority of memory in the computer is "owned" by the operating system and if you touch it, it will terminate your program. This is for your safety.
The memory you're trying to address is valid, in the right range, but not quite the right type. From the earlier example, you might have a reasonable house number but there's no house at that location. Or the address is really an apartment and just a number won't do.
In an old 8-bit computer from the 1980's with a full 64k of memory, you could just read any location you wanted to and it would be fine. Not so much anymore.
In theory, you have permissions to read from any address within your virtual address space (e.g. 0 to 0xFFFFFFFF on a 32-bit machine).
0 and negative numbers are not a problem - once you assign them to pointers, they are casted to non-negative values.
In practice, it won't work. OS will protect itself (and you) from this - it won't let you read from address that doesn't really belong to you.
That is, if you haven't allocated the memory page and haven't write something there, OS won't let you to read from there.
Moreover, you don't really own the whole address space - lower part of it is owned by kernel et al., so OS won't let you access it.
Pointers are exclusively positive from what I have heard. 0 (the NULL pointer) is guaranteed not to point at anything and will cause the program to halt. Further, operatings systems (even hardware if I remember correctly) provide memory protection. This is why programs used to be able to crash each other but this is now much less common. When you program runs, the OS decides what memory it has access to and will throw a segfault if you try to access memory that isn't yours.
Then again, perhaps you just wanted b = &a? This would make b point to the same place as a exists and so, when you *b it would equate to the value stored in a.
As i see you declare b as pointer, hence it is wrong to do a=b. You will get segmentation fault. Pointers only shows to pointers not to values of integers, floats or chars.
Alternatively you could do b = &a, which means that b shows to the memory address of a. So you could print then the value stored in the a.

C Tutorial - Wonder about `int i = *(int *)&s;`

Working my way through a C tutorial
#include <stdio.h>
int main() {
short s = 10;
int i = *(int *)&s; // wonder about this
printf("%i", i);
return 0;
}
When I tell C that the address of s is an int, should it not read 4 bytes?
Starting from the left most side of 2 bytes of s. In which case is this not critically dangerous as I don't know what it is reading since the short only assigned 2 bytes?
Should this not crash for trying to access memory that I haven't assigned/belong-to-me?
Don't do that ever
Throw away the tutorial if it teaches/preaches that.
As you pointed out it will read more bytes than that were actually allocated, so it reads off some garbage value from the memory not allocate by your variable.
In fact it is dangerous and it breaks the Strict Aliasing Rule[Detail below] and causes an Undefined Behavior.
The compiler should give you a warning like this.
warning: dereferencing type-punned pointer will break strict-aliasing rules
And you should always listen to your compiler when it cries out that warning.
[Detail]
Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)
The exception to the rule is a char*, which is allowed to point to any type.
First of all, never do this.
As to why it doesn't crash: since s is a local, it's allocated on the stack. If short and int have different sizes in your architecture (which is not a given), then you will probably end up reading a few more bytes from memory that's on the same memory page as the stack; so and there will be no access violation (even though you will read garbage).
Probably.
This is dangerous and undefined behaviour, just as you said.
The reason why it doesn't crash on 32 (or 64) bit platforms is that most compilers allocate atleast 32 bits for each stack variable. This makes the access faster, but on e.g. 8 bit processor you would get garbage data in the upper bits instead.
No it's not going to crash your program, however it is going to be reading a portion of other variables (or possibly garbage) on the stack. I don't know what tutorial you got this from, but that kind of code is scary.
First of all, all addresses are of the same size and if you're in a 64bit architecture, each char *, short * or int * will have 8 bytes.
When using a star before an ampersand it will cancel the effect, so *&x is semantically equivalent to just x.
Basically you are right in the sense that since you are accessing an int * pointer, this will fetch 4 bytes instead of the only 2 reserved for 's' storage and the resulting content won't be a perfect reflection of what 's' really means.
However this most likely won't crash since 's' is located on the stack so depending on how your stack is laid out at this point, you will most likely read data pushed during the 'main' function prologue...
See for a program to crash due to invalid read memory access, you need to access a memory region that is not mapped which will trigger a 'segmentation fault' at the userworld level while a 'page fault' at the kernel level. By 'mapped' I mean you have a known mapping between a virtual memory region and a physical memory region (such mapping is handled by the operating system). That is why if you access a NULL pointer you will get such exception because there is no valid mapping at the userworld level. A valid mapping will usually be given to you by calling something like malloc() (note that malloc() is not a syscall but a smart wrapper around that manages your virtual memory blocks). Your stack is no exception since it is just memory like anything else but some pre-mapped area is already done for you so that when you create a local variable in a block you don't have to worry about its memory location since that's handled for you and in this case you are not accessing far enough to reach something non-mapped.
Now let's say you do something like that:
short s = 10;
int *i = (int *)&s;
*i = -1;
Then in this case your program is more likely to crash since in this case you start overwriting data. Depending on the data you are touching the effect of this might range from harmless program misbehavior to a program crash if for instance you overwrite the return address pushed in the stack... Data corruption is to me one of the hardest (if not the hardest) bugs category to deal with since its effect can affect your system randomly with non-deterministic pattern and might happen long after the original offending instructions were actually executed.
If you want to understand more about internal memory management, you probably want to look into Virtual Memory Management in Operating System designs.
Hope it helps,

Is unused memory in address space protected

Is the unused memory in address space of a process protected by just having read permission, so that writing to a location pointed by an unitialized pointer for example always cause a page fault to be trapped by the OS? Or is it not the case, and every memory location besides the code (which ofcourse is given read only access), is given write access?
I'm asking this because my friend was showing me his code where he didn't initialize a pointer and wrote in the memory pointed by it, but still his program wasn't crashing with mingw gcc compiler for windows but always crashing with visual c++, in mac or linux.
What I think is that the OS do not protect memory for unused areas and the crashing was being caused because in the code generated by the mingw, the random pointer value was pointing to some used area such as stack, heap or code, while in other cases it was pointing to some free area. But if the OS really doesn't protect the unused areas, wouldn't these sort of bugs, such as uninitialized pointers be difficult to debug?
I guess this is why it is advised to always assign NULL to a pointer after calling delete or free, so that when something is accessed with it, it really causes a visible crash.
Uninitialized pointers don't necessarily to point to unused address space. They could very well be values that happen to point to writeable memory. Such as a pointer on the stack that happened to be where a previously executed function stored a valid address.
In a typical, current server/desktop OS (and quite a few smaller systems such as cell phones as well) you have virtual memory. This means the OS builds a table that translates from the virtual address your code uses, to a physical address that specifies the actual memory being addressed. This mapping is normally done in "pages" -- e.g., a 4KB chunk of memory at a time.
At least in the usual case, parts of the address space that aren't in use at all simply won't be mapped at all -- i.e., the OS won't build an entry in the table for that part of the address space. Note, however, that memory that is allocated will (of necessity) be rounded to a multiple of the page size, so each chunk of memory that's in use will often be followed by some small amount that's not really in use, but still allocated and "usable". Since protection is also (normally) done on a per-page basis, if the rest of that page is (say) Read-only, the remainder at the tail end will be the same.
It depends on the implementation of the OS. In some configurations, for example, ExecShield will protect most of the memory that goes out of the bounds of the program, and also it is common that the first few bytes of the data segment to be protected (to signal access with NULL pointers), but it may be the case that the pointer actually points to a valid, arbitrary, memory address within the program.
Memory protection is not provided by c/c++. You may find that the pointer just happens to contain a pointer to valid memory, e.g. a previous function has a ptr variable on the stack and another function called later just happens to use the same stack space for a pointer.
The following code will print "Hello" if compiled and ran with gcc:
#include
char buffer[10];
void function1(void) {
char * ptr = buffer;
sprintf(buffer, "Hello");
return;
}
void function2(void) {
char * ptr2;
printf("%s\n", ptr2);
}
int main(int argc, char * argv[]) {
function1();
function2();
}
For debug builds some compilers (I know that Visual Studio used to do this) will secretly initialise all variables like ptr2 to a bad value to detect these kinds of error.
With C normally you find out that memory has been abused by the OS killing your program.
Simply, I assume the answer is "No, unused memory in address is not space protected." C isn't sophisticated enough to handle such instances.

Resources