reading a arbitary memory location making the program crash - c

I am trying to read the value at random memory location using the following c code
main()
{
int a,*b;
printf("enter the value of a");
scanf("%d",&a);
b=a;
printf("%d\n%d\n",a,*b);
getch();
}
But the program is crashing when some negative values or even when zero is entered in place of variable a through scanf.
What I am doing wrong? Does the pointers dont have negative values?

The thing is that as you are probably running on a modern, full service operating system and it provides a more complicated abstract machine than the one described in most intro to c programming books.
In particular, there are OS imposed restrictions on access to arbitrary addresses. You don't notice this when you look at addresses associated with standard variables because you do have permission to use the stack and the data segment, and the alloc family of functions takes care of making sure that you have permission to access the parts of the heap that they hand you.
So, what you are doing is accessing memory for which you do not have permission, which results in a failure called a "segmentation fault" on most OS, and that abruptly ends your program.
What can you do about it?
Allocate a big block on the heap, by calling char *p = malloc(1000000); and then find the starting and ending addresses with something like printf("start:\t%p\nend\t%p\n",(void*)p,(void*)(p+1000000)); and only entering numbers in that range.
Note that the %p specifier print specifier outputs in hexadecimal, so you probably want to enter address in the same base. The standard library function strtol will be helpful in that regard.
A more sophisticated approach would be to use your OS's API to request access permission for arbitrary address, but for some values the OS is likely to simply refuse.

I see some confusion here over just want a pointer is.
First, you ask the user for a value. This is fine. Then you assign that value as the location of the pointer b. This MAY be fine but likely not.
Think for a moment, what does *(-500) mean? What would *(0) mean?
In general you can never just take user input and use it without first checking it or manipulating it. This is one place where security breaches come from.
If you want to experiment with dereferencing memory, just hard code some values at first. Load the program up in a debugger and see what happens.
int c;
b = 500;
c = *b; // what happens?
b = 0;
c = *b; // what happens?
b = -100;
c = *b; // what happens?

Let me greatly oversimplify for you...
In almost all modern computers, with most operating systems, very little of the memory in the machine is directly addressable by your program. You can't take a pointer, point it at something, and try to read it. It will almost always fail.
There are generally three things that will go wrong:
The memory doesn't exist where you're pointing. Pointers can hold large range of values, and not all of them mean anything. It's like a house number in a postal address. Technically, you can put anything you want on the envelope. Only some are valid.
The memory exists, but isn't yours. The vast majority of memory in the computer is "owned" by the operating system and if you touch it, it will terminate your program. This is for your safety.
The memory you're trying to address is valid, in the right range, but not quite the right type. From the earlier example, you might have a reasonable house number but there's no house at that location. Or the address is really an apartment and just a number won't do.
In an old 8-bit computer from the 1980's with a full 64k of memory, you could just read any location you wanted to and it would be fine. Not so much anymore.

In theory, you have permissions to read from any address within your virtual address space (e.g. 0 to 0xFFFFFFFF on a 32-bit machine).
0 and negative numbers are not a problem - once you assign them to pointers, they are casted to non-negative values.
In practice, it won't work. OS will protect itself (and you) from this - it won't let you read from address that doesn't really belong to you.
That is, if you haven't allocated the memory page and haven't write something there, OS won't let you to read from there.
Moreover, you don't really own the whole address space - lower part of it is owned by kernel et al., so OS won't let you access it.

Pointers are exclusively positive from what I have heard. 0 (the NULL pointer) is guaranteed not to point at anything and will cause the program to halt. Further, operatings systems (even hardware if I remember correctly) provide memory protection. This is why programs used to be able to crash each other but this is now much less common. When you program runs, the OS decides what memory it has access to and will throw a segfault if you try to access memory that isn't yours.
Then again, perhaps you just wanted b = &a? This would make b point to the same place as a exists and so, when you *b it would equate to the value stored in a.

As i see you declare b as pointer, hence it is wrong to do a=b. You will get segmentation fault. Pointers only shows to pointers not to values of integers, floats or chars.
Alternatively you could do b = &a, which means that b shows to the memory address of a. So you could print then the value stored in the a.

Related

Segfault when dereferencing a custom mem address (C)

I want to declare a pointer, have it hold a custom address and then assign a value to it:
void main()
{
char *ptr;
ptr = (char *)0x123123; //the assignment works perfectly with a cast
printf("%p\n", ptr); //and the pointer indeed holds the address it's supposed to
*ptr = 'a'; //but this breaks
puts("2");
}
Initially I thought the reason is because I'm trying to dereference uninitialized memory. But I doubt actually that this is the case, since this some_type *some_ptr = &some_variable; works flawlessly, so the deal must be the address I assign it to.
Then I thought, in the same way 3 or 'a' or "alpine" are constants, (char *) 0x123123 must be a constant too. And const-s can't be edited in C, but that still can't be it, because an attempt to change a const value will not compile.
3rd assumption would be that such an address must be unavailable, but this doesn't make sense either, because line 4 works always, no matter the address I give or the type of the pointer.
3rd assumption would be that such an address must be unavailable,
That is correct: on modern OSes (which all have memory protection) you can't write to arbitrary memory address.
It used to be possible to access any memory on OSes that didn't utilize virtual memory (such as MS-DOS), but allowing that is generally a very bad idea -- it allowed random program to corrupt OS state, and required very frequent reboots.
but this doesn't make sense either, because line 4 works always, no matter the address I give or the type of the pointer.
You confuse two distinct operations: printing an address (allowed no matter what that address is) and dereferencing an address, i.e. reading or modifying the value stored at the address (only allowed for valid addresses).
The distinction is similar to "can you print an address?" (e.g. "123 Main Street, SomeTown, SomeCountry"), and "can you enter a house at that address?" (not possible for above address because there is no "SomeCountry" on Earth). Even if the address is valid, e.g. "1600 Pennsylvania Ave NW, Washington, DC 20500", you may still not be allowed to enter it.
The OP clarified elsewhere, that this is actually an XY problem.
The X problem: reading/writing to arbitrary memory locations.
The Y problem: implementing a linked list that uses consecutive memory.
Of course, the answer to that is: one has to implement his complete own memory management system to get there.
As in: first, you use malloc() to acquire a large block of consecutive memory. Then you can use arbitrary pointers within that block of memory. But of course, your code has to track which addresses are already used. Or to correctly "free" up when list nodes get deleted.
The tricky part is about handling the corner cases, such as: what happens when your last "pointer" gets used up? Do you malloc() a larger area, and move all data in memory?
Finally: assume that you don't manage a block of memory, but a single array. ( linked lists implementations are often based on arrays, as that makes some things much easier )
writing to some arbitrary memory address is dangerous and not allowed by modern operating systems, better to create a memory blob and write to that.
e.g. using malloc :
ptr = malloc(32); // now you can write to this memory block and it perfectly legal
*ptr = 'a';

Declare a pointer to an integer at address 0x200 in memory

I have a couple of doubts, I remember some where that it is not possible for me to manually put a variable in a particular location in memory, but then I came across this code
#include<stdio.h>
void main()
{
int *x;
x=0x200;
printf("Number is %lu",x); // Checkpoint1
scanf("%d",x);
printf("%d",*x);
}
Is it that we can not put it in a particular location, or we should not put it in a particular location since we will not know if it's a valid location or not?
Also, in this code, till the first checkopoint, I get output to be 512.
And then after that Seg Fault.
Can someone explain why? Is 0x200 not a valid memory location?
In the general case - the behavior you will get is undefined - everything can happen.
In linux for example, the first 1GB is reserved for kernel, so if you try to access it - you will get a seg fault because you are trying to access a kernel memory in user mode.
No idea how it works in windows.
Reference for linux claim:
Currently the 32 bit x86 architecture is the most popular type of
computer. In this architecture, traditionally the Linux kernel has
split the 4GB of virtual memory address space into 3GB for user
programs and 1GB for the kernel.
Adding to what #amit wrote:
In windows it is the same. In general it is the same for all protected-mode operating systems. Since DOS etc. are no longer around it is the same with all systems except kernel-mode (km-drivers) and embedded systems.
The operating system manages which memory-pages you are allowed to write to and places markers that will make the cpu automatically raise access-violations if some other page is written to.
Up until the "checkpoint", you haven't accessed memory location 0x200, so everything works fine.
There I'd a local variable x in the function main. It is of type "pointer to int". x is assigned the value 0x200, and then that value is printed. But the target of x hasn't been accessed, so up to this point it doesn't matter whether x holds a valid memory address or not.
Then scanf tries to write to the memory address you passed in, which is the 0x200 stored in x. Then you get a seg fault, which is certainly sac possible result of trying to write to an arbitrary memory address.
So what are your doubts? What makes you think that this might work, when you come across this code that clearly doesn't?
Writing to a particular memory address might work under certain conditions, but is extremely unlikely to in general. Under all modern OSes, normal programs do not have control over their memory layout. The OS decides where initial things like the program's code, stack, and globals go. The OS will probably also be using some memory space, and it is not required to tell you what it's using. Instead you ask for memory (either by making variables or by calling memory allocation routines), and you use that.
So writing to particular addresses is very very likely to get either memory that hasn't been allocated, or memory that is being used for some other purpose. Neither of those is good, even if you do manage to hit an address that is actually writable. What if you clobber sundry some piece of data used by one of your program's other variables? Or some other part of your program clobbers the value you just wrote?
You should never be choosing a particular hard-coded memory address, you should be using an address of something you know is a variable, or an address you got from something like malloc.

Garbage values in a multiprocess operating system

Does the allocated memory holds the garbage value since the start of the OS session? Does it have some significance before we name it as a garbage value in our program runtime session? If so then why?
I need some advice on study materials regarding linux kernel programming, device driver programming and also want to develop an understanding on how the computer devices actually work. I get stuck into the situations like the "garbage value" and feel like I have to study something else also for better understanding of the programming language. I am studying by myself and getting a lot of confusing situations. Any advice will be really helpful.
"Garbage value" is a slang term, meaning "I don't know what value is there, or why, and for that reason I will not use the value". It is "garbage" in the sense of "useless nonsense", and sometimes it is also "garbage" in the sense of "somebody else's leavings".
Formally, uninitialized memory in C takes "indeterminate values". This might be some special value written there by the C implementation, or it might be something "left over" by an earlier user of the same memory. So for examples:
A debug version of the C runtime might fill newly-allocated memory with an eye-catcher value, so that if you see it in the debugger when you were expecting your own stored data, you can reasonably conclude that either you forgot to initialize it or you're looking in the wrong place.
The kernel of a "proper" operating system will overwrite memory when it is first assigned to a process, to avoid one process seeing data that "belongs" to another process and that for security reasons should not leak across process boundaries. Typically it will overwrite it with some known value, like 0.
If you malloc memory, write something in it, then free it and malloc some more memory, you might get the same memory again with its previous contents largely intact. But formally your newly-allocated buffer is still "uninitialized" even though it happens to have the same contents as when you freed it, because formally it's a brand new array of characters that just so happens to have the same address as the old one.
One reason not to use an "indeterminate value" in C is that the standard permits it to be a "trap representation". Some machines notice when you load certain impossible values of certain types into a register, and you'd get a hardware fault. So if the memory was previously used for, say, an int, but then that value is read as a float, who is to say whether the left-over bit pattern represents a so-called "signalling NaN", that would halt the program? The same could happen if you read a value as a pointer and it's mis-aligned for the type. Even integer types are permitted to have "parity bits", meaning that reading garbage values as int could have undefined behavior. In practice, I don't think any implementation actually does have trap representations of int, and I doubt that any will check for mis-aligned pointers if you just read the pointer value -- although they might if you dereference it. But C programmers are nothing if not cautious.
What is garbage value?
When you encounter values at a memory location and cannot conclusively say what these values should be then those values are garbage value for you. i.e: The value is Indeterminate.
Most commonly, when you use a variable and do not initialize it, the variable has an Indeterminate value and is said to possess a garbage value. Note that using an Uninitialized variable leads to an Undefined Behavior, which means the program is not a valid C/C++ program and it may show(literally) any behavior.
Why the particular value exists at that location?
Most of the Operating systems of today use the concept of virtual memory. The memory address a user program sees is an virtual memory address and not the physical address. Implementations of virtual memory divide a virtual address space into pages, blocks of contiguous virtual memory addresses. Once done with usage these pages are usually at least 4 kilobytes. These pages are not explicitly wiped of their contents they are only marked as free for reuse and hence they still contain the old contents if not properly initialized.
On a typical OS, your userspace application only sees a range of virtual memory. It is up to the kernel to map this virtual memory to actual, physical memory.
When a process requests a piece of (virtual) memory, it will initially hold whatever is left in it -- it may be a reused piece of memory that another part of the process was using earlier, or it may be memory that a completely different process had been using... or it may never have been touched at all and be in whatever state it was when you powered on the machine.
Usually nobody goes and wipes a memory page with zeros (or any other equally arbitrary value) on your behalf, because there'd be no point. It's entirely up to your application to use the memory in whatever way you please, and if you're going to write to it anyway, then you don't care what was in it before.
Consequently, in C it is simply not allowed to read a variable before you have written to it, under pain of undefined behaviour.
If you declare a variable without initialising it to a particular value, it may contain a value which was previously assigned by a different program that has since released that piece of memory, or it may simply be a random value from when the computer was booted (iirc, PCs used to initialise all RAM to 0 on bootup because early versions of DOS required it, but new computers no longer do this). You can't assume the value will be zero, for instance.
Garbage value, e.g. in C, typically refers to the fact that if you just reserve memory, but never intialize it, it will hold random values, since it simply is not initialized yet (C doesn't do that for you automatically; it would just be overhead, and C is designed for as little overhead as possible).
The random values in the memory are leftovers from whatever was in there before.
These previous values are left in there, because usually there is not much use in going around setting memory to zero - or any other value - that will later be overwritten again anway. Because for the general case, there is no use in reading uninitialized memory (except if you e.g. want to exploit possible security issues - see the special cases where memory is actually zeroed: Kernel zeroes memory?).

C Tutorial - Wonder about `int i = *(int *)&s;`

Working my way through a C tutorial
#include <stdio.h>
int main() {
short s = 10;
int i = *(int *)&s; // wonder about this
printf("%i", i);
return 0;
}
When I tell C that the address of s is an int, should it not read 4 bytes?
Starting from the left most side of 2 bytes of s. In which case is this not critically dangerous as I don't know what it is reading since the short only assigned 2 bytes?
Should this not crash for trying to access memory that I haven't assigned/belong-to-me?
Don't do that ever
Throw away the tutorial if it teaches/preaches that.
As you pointed out it will read more bytes than that were actually allocated, so it reads off some garbage value from the memory not allocate by your variable.
In fact it is dangerous and it breaks the Strict Aliasing Rule[Detail below] and causes an Undefined Behavior.
The compiler should give you a warning like this.
warning: dereferencing type-punned pointer will break strict-aliasing rules
And you should always listen to your compiler when it cries out that warning.
[Detail]
Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)
The exception to the rule is a char*, which is allowed to point to any type.
First of all, never do this.
As to why it doesn't crash: since s is a local, it's allocated on the stack. If short and int have different sizes in your architecture (which is not a given), then you will probably end up reading a few more bytes from memory that's on the same memory page as the stack; so and there will be no access violation (even though you will read garbage).
Probably.
This is dangerous and undefined behaviour, just as you said.
The reason why it doesn't crash on 32 (or 64) bit platforms is that most compilers allocate atleast 32 bits for each stack variable. This makes the access faster, but on e.g. 8 bit processor you would get garbage data in the upper bits instead.
No it's not going to crash your program, however it is going to be reading a portion of other variables (or possibly garbage) on the stack. I don't know what tutorial you got this from, but that kind of code is scary.
First of all, all addresses are of the same size and if you're in a 64bit architecture, each char *, short * or int * will have 8 bytes.
When using a star before an ampersand it will cancel the effect, so *&x is semantically equivalent to just x.
Basically you are right in the sense that since you are accessing an int * pointer, this will fetch 4 bytes instead of the only 2 reserved for 's' storage and the resulting content won't be a perfect reflection of what 's' really means.
However this most likely won't crash since 's' is located on the stack so depending on how your stack is laid out at this point, you will most likely read data pushed during the 'main' function prologue...
See for a program to crash due to invalid read memory access, you need to access a memory region that is not mapped which will trigger a 'segmentation fault' at the userworld level while a 'page fault' at the kernel level. By 'mapped' I mean you have a known mapping between a virtual memory region and a physical memory region (such mapping is handled by the operating system). That is why if you access a NULL pointer you will get such exception because there is no valid mapping at the userworld level. A valid mapping will usually be given to you by calling something like malloc() (note that malloc() is not a syscall but a smart wrapper around that manages your virtual memory blocks). Your stack is no exception since it is just memory like anything else but some pre-mapped area is already done for you so that when you create a local variable in a block you don't have to worry about its memory location since that's handled for you and in this case you are not accessing far enough to reach something non-mapped.
Now let's say you do something like that:
short s = 10;
int *i = (int *)&s;
*i = -1;
Then in this case your program is more likely to crash since in this case you start overwriting data. Depending on the data you are touching the effect of this might range from harmless program misbehavior to a program crash if for instance you overwrite the return address pushed in the stack... Data corruption is to me one of the hardest (if not the hardest) bugs category to deal with since its effect can affect your system randomly with non-deterministic pattern and might happen long after the original offending instructions were actually executed.
If you want to understand more about internal memory management, you probably want to look into Virtual Memory Management in Operating System designs.
Hope it helps,

Using Malloc to allocate array size in C

In a program I'm writing, I have an array of accounts(account is a struct I made). I need this visible to all functions and threads in my program. However, I won't know the size it has to be until the main function figures that out. so I created it with:
account *accounts;
and try to allocate space to it in main with this:
number of accounts = 100 //for example
accounts = (account*)malloc(numberOfAccounts * sizeof (account));
However, it appears to be sizing the array larger than it needs to be. For example, accounts[150] exists, and so on.
Is there anything I am doing wrong? How can I get the size of accounts to be exactly 100?
Thanks
You can't do that - malloc() doesn't provide any guarantees about how much memory it actually allocates (except that if it succeeds it will return a pointer to at least as much as you requested). If you access anything outside the range you asked for, it causes undefined behaviour. That means it might appear to work, but there's nothing you can do about that.
BTW, in C you don't need to typecast the returned value from malloc().
Even though it may look like it, accounts[150] does not truly exist.
So why does your program continue to run? Well, that's because even though accounts[150] isn't a real element, it lies within the memory space your program is allowed to access.
C contains no runtime checking of indexes - it just calculates the appropriate address and accesses that. If your program doesn't have access to that memory address, it'll crash with a segmentation fault (or, in Windows terms, an access violation). If, on the other hand, the program is allowed to access that memory address, then it'll simply treat whatever is at that address as an account.
If you try to modify that, almost anything can happen - depending on a wide variety of factors, it could modify some other variables in your program, or given some very unlucky circumstances, it could even modify the program code itself, which could lead to all kinds of funky behavior (including a crash). It is even possible that no side effects can ever be observed if malloc (for whatever reason) allocated more memory than you explicitly requested (which is possible).
If you want to make sure that such errors are caught at runtime, you'll have to implement your own checking and error handling.
I can't seem to find anything wrong with what you provide. If you have a struct, e.g.:
struct account{
int a,b,c,d;
float e,f,g,h;
}
Then you can indeed create an array of accounts using:
struct account *accounts = (struct account *) malloc(numAccounts * sizeof(account));
Note that for C the casting of void* (retun type of malloc) is not necessary. It will get upcasted automatically.
[edit]
Ahhh! I see your problem now! Right. Yes you can still access accounts[150], but basically what happens is that accounts will point to some memory location. accounts[150] simply points 150 times the size of the struct further. You can get the same result by doing this:
*(accounts + 150), which basically says: Give me the value at location accounts+150.
This memory is simply not reserved, and therefore causes undefined behavior. It basically comes down to: Don't do this!
Your code is fine. When you say accounts[150] exits do you mean exits or exists?
If your code is crashing when accessing accounts[150] (assuming numberOfAccounts = 100) then this is to be expected you are accessing memory outside that you allocated.
If you meant exists it doesn't really, you are just walking off the end of the array and the pointer you get back is to a different area of memory than you allocated.
Size of accounts is exacly for 100 structures from malloc result pointer starts if this address is non-zero.
Just because it works doesn't mean it exists as part of the memory you allocated, most likely it belongs to someone else.
C doesn't care or know that your account* came from malloc, all it knows is that is a memory pointer to something that is sizeof(account).
accounts[150] accesses the 150th account-sized object from the value in the pointer, which may be random data, may be something else, depending on your system it may even be your program.
The reason things seem to "work" is that whatever is there happens to be unimportant, but that might not always be the case.

Resources