Is there a way to test for an invalid memory location?

Is there a way to test for an invalid memory location? - c

In a language like C, for example, if a routine receives a pointer, is there any system call or other test that can be applied to the pointer to check that it is a valid memory location, other than catching SIGSEGV or equivalent?

No, you can't for sure check whether the address is invalid. Even if you used some operating system function to test if teh address is mapped into the address space you still can't be sure if the address is of some service data that you should not read or modify.
One good example. If your program uses Microsoft RPC to accept calls from another program you have to implement a set of callback functions to server the requests. Those callback functions will be run on separated threads started by RPC. You don't know when those thereads start and what their stack size is, so you can't detect whether a buffer overrun occurs if you write through an address that is meant to be of a stack variable but accidentially is to the stack of another thread.

Well, if you knew where the memory being pointed to was being stored (on the stack, for instance), you could check to see if it's in a certain 'range' that is the approximate address range of the stack. That could also work for something on the heap, if you have an idea of how big your heap "should" be. It's definitely not a fail-safe approach, but I'm unaware of any sure-fire methods for checking the 'validity' of a pointer.

If you mean purely within your own application you can establish a convention that any memory allocated by your code is initialized in a way you can recognize. E.g. in one project I saw they wrote an eyecatcher in the first few bytes. In some products I know they write a unique id at the start and end and each time it's accessed they check the 2 ids still match to show it's not been corrupted. E.g CICS on z/Series does the latter.

Related

What exactly is the value contained in an uninitialized local variable in C?

If we have a function in C with a simple unitialized ìnt variable in it, we know that this variable may not be always initialized to zero. Instead, it may contain some "garbage" value.
My question is: what exactly could represent that value? It can be some information left (unfreed memory) by a process that was terminated before?
If yes, then wouldn't be this an extremely major security breach? Because in that way any process can read information left by processes that used the same address space as the current process (passwords, tokens, etc.).
My assumption is that for each new process, the kernel zeroes the memory allocated for that new process (at least for the stack) and then it loads the executable into memory. Those "garbage" values are actually values generated by the loading procedure of the current process (so that there is no way to access any left data from other processes that used the same address space).
I'm arguing with some fellows on this topic and I really want a clear and comprehensive answer to this (I'm sure there is one). We are assuming that the kernel is debian/centos based. It would be great to know if there are differences in behaviour for different kernels / OS-es.
Thank you respectfully.

This should be separated into two questions:
What does the C standard say about the value of an uninitialized object?
What is in memory when main is called?
The first question is discussed in other Stack Overflow questions and answers. A full answer is complicated and involves a discussion of a variety of circumstances, and this question does not seem to be asking about that particularly, so I will leave it for the other Stack Overflow questions. For this question, suffice it to say that using the value of an uninitialized object is prone to undefined behavior. Further, this is not simply because the memory of the object might have troublesome values but because the C standard permits a C implementation to treat a program that reads an uninitialized value as a misbehaving program in various ways, and optimizations can then disrupt the program further.
As far as what is in memory is concerned (supposing we have a supported way to examine it, perhaps by using assembly language instead of C), then every multiuser system that provides any sort of security erases (or otherwise initializes) memory before making it available to a process. Any values that are in memory at the time main is called are, as the question contemplates, either the result of the loading process or of initialization by the operating system. (Note that the result of the loading process includes both loading of constant data and program text—so we would expect to find the defined values there—and whatever data is leftover from the work done by the loading code—its variables and so on.)
The question asks for a clear answer, so let me be clear about this: An operating system that provides security for user processes must erase data of previous processes from memory before making that memory available to another process. Security cannot be provided by trusting a program not to examine the memory it is given and doing whatever it wants with it.
Rudimentary systems not intended for sharing by untrusted users can of course skip the initialization of memory when creating new processes and allocating memory for them.

Well, the local variables store in stack space, so once you finish the call to the current routine, stack pointer moves up to free all the current routine local variables and, for efficiency reasons, no previous contents are erased (only the stack pointer is moved).
If you enter a new routine, what the compiler does, is to move the stack pointer down (it doesn't push anything on the local variables space, just moves over that space to make room for the new set of local variables) and doesn't use that space until a local variable is needed in the code. What you are asking for is how to interpret the bit pattern that the stack segment has from previous use, and that depends on how the stack has been used previously to entering the current routine. This can be:
rests of temporary data used to calculate a complex expression.
parameter data of a previous call to another routine.
return addresses of previous called routines.
local variables of a previously called routine, that, as ended, are not in use anymore.
any other thing.
As that memory is used now in a different way (as the local space of current routine dictates) there's no valid interpretation of such memory contents, but as trashed data from old code.

Garbage values in a multiprocess operating system

Does the allocated memory holds the garbage value since the start of the OS session? Does it have some significance before we name it as a garbage value in our program runtime session? If so then why?
I need some advice on study materials regarding linux kernel programming, device driver programming and also want to develop an understanding on how the computer devices actually work. I get stuck into the situations like the "garbage value" and feel like I have to study something else also for better understanding of the programming language. I am studying by myself and getting a lot of confusing situations. Any advice will be really helpful.

"Garbage value" is a slang term, meaning "I don't know what value is there, or why, and for that reason I will not use the value". It is "garbage" in the sense of "useless nonsense", and sometimes it is also "garbage" in the sense of "somebody else's leavings".
Formally, uninitialized memory in C takes "indeterminate values". This might be some special value written there by the C implementation, or it might be something "left over" by an earlier user of the same memory. So for examples:
A debug version of the C runtime might fill newly-allocated memory with an eye-catcher value, so that if you see it in the debugger when you were expecting your own stored data, you can reasonably conclude that either you forgot to initialize it or you're looking in the wrong place.
The kernel of a "proper" operating system will overwrite memory when it is first assigned to a process, to avoid one process seeing data that "belongs" to another process and that for security reasons should not leak across process boundaries. Typically it will overwrite it with some known value, like 0.
If you malloc memory, write something in it, then free it and malloc some more memory, you might get the same memory again with its previous contents largely intact. But formally your newly-allocated buffer is still "uninitialized" even though it happens to have the same contents as when you freed it, because formally it's a brand new array of characters that just so happens to have the same address as the old one.
One reason not to use an "indeterminate value" in C is that the standard permits it to be a "trap representation". Some machines notice when you load certain impossible values of certain types into a register, and you'd get a hardware fault. So if the memory was previously used for, say, an int, but then that value is read as a float, who is to say whether the left-over bit pattern represents a so-called "signalling NaN", that would halt the program? The same could happen if you read a value as a pointer and it's mis-aligned for the type. Even integer types are permitted to have "parity bits", meaning that reading garbage values as int could have undefined behavior. In practice, I don't think any implementation actually does have trap representations of int, and I doubt that any will check for mis-aligned pointers if you just read the pointer value -- although they might if you dereference it. But C programmers are nothing if not cautious.

What is garbage value?
When you encounter values at a memory location and cannot conclusively say what these values should be then those values are garbage value for you. i.e: The value is Indeterminate.
Most commonly, when you use a variable and do not initialize it, the variable has an Indeterminate value and is said to possess a garbage value. Note that using an Uninitialized variable leads to an Undefined Behavior, which means the program is not a valid C/C++ program and it may show(literally) any behavior.
Why the particular value exists at that location?
Most of the Operating systems of today use the concept of virtual memory. The memory address a user program sees is an virtual memory address and not the physical address. Implementations of virtual memory divide a virtual address space into pages, blocks of contiguous virtual memory addresses. Once done with usage these pages are usually at least 4 kilobytes. These pages are not explicitly wiped of their contents they are only marked as free for reuse and hence they still contain the old contents if not properly initialized.

On a typical OS, your userspace application only sees a range of virtual memory. It is up to the kernel to map this virtual memory to actual, physical memory.
When a process requests a piece of (virtual) memory, it will initially hold whatever is left in it -- it may be a reused piece of memory that another part of the process was using earlier, or it may be memory that a completely different process had been using... or it may never have been touched at all and be in whatever state it was when you powered on the machine.
Usually nobody goes and wipes a memory page with zeros (or any other equally arbitrary value) on your behalf, because there'd be no point. It's entirely up to your application to use the memory in whatever way you please, and if you're going to write to it anyway, then you don't care what was in it before.
Consequently, in C it is simply not allowed to read a variable before you have written to it, under pain of undefined behaviour.

If you declare a variable without initialising it to a particular value, it may contain a value which was previously assigned by a different program that has since released that piece of memory, or it may simply be a random value from when the computer was booted (iirc, PCs used to initialise all RAM to 0 on bootup because early versions of DOS required it, but new computers no longer do this). You can't assume the value will be zero, for instance.

Garbage value, e.g. in C, typically refers to the fact that if you just reserve memory, but never intialize it, it will hold random values, since it simply is not initialized yet (C doesn't do that for you automatically; it would just be overhead, and C is designed for as little overhead as possible).
The random values in the memory are leftovers from whatever was in there before.
These previous values are left in there, because usually there is not much use in going around setting memory to zero - or any other value - that will later be overwritten again anway. Because for the general case, there is no use in reading uninitialized memory (except if you e.g. want to exploit possible security issues - see the special cases where memory is actually zeroed: Kernel zeroes memory?).

What can cause Program Counter to have an invalid address?

I am getting an exception "Invalid Program Counter Address" in Vxworks + PPC 603.
Application is linking to multiple 'C' libraries. Am not able to place, what could cause this problem?
Is there a possibility that incorrect compilation options could be causing this?
Any directions or pointers will be helpful.
Thanks
UPDATE:
I am having a structure whose members are function pointers. The structure itself is static and it's address is passed around and through the structure different functions are being invoked.
During one of the test rounds, I found that in the function pointer, the function address value is reduced by 1. If the function address is 0x009a3730, the PC is having 0x00913729.
Also, if I change the compiler options, the place of crash or the number of runs after which the crash happens changes.

Any case where you're working with function pointers can easily lead to this, if the pointer value gets corrupted and later is called. Check signal handlers if any, and any other API:s that deal with callbacks.

"If the function address is 0x009a3730, the PC is having 0x00913729". The difference here is not 1 :) However PC will always point to the address of the next instruction it has to execute AFAIK.
Maybe you could run the core dump in a debugger and print out the :
Back trace
'disassemble' code around the region of the crash
info registers -> register values at the time of the crash
info locals --> local variables of the function inside which it crashed

#All, Thanks for your suggestions.
It turned out that the location containing the address was incorrectly getting pointed to a reference member of another structure and that reference member was getting decremented by one in each call to free that structure.
The memory for that structure should have been allocated by a call to one of our functions. But, instead it was left to refer to some garbade memory without any initalization or memory allocation and it ended up referring to this static memory where the global structure is stored. This led to the static structure getting corrupted and which inturn led to the crash.
A thorough line-by-line analysis of our logs helped in putting all pieces together.

assigning a value to the address

I tried the below program to make the pointer to point to a particular address and to store a value in that address.When i make the pointer to contain the value for the assigned address i'm getting a run time error asking me to close the program.
Is it not possible to assign a value to the address 0x6778.why is it so? In what situations does this needed? Please help me understand.
int *p=(int*)0x6778;
printf("The address is:%x",p);
When tried to do *p=1000 i am getting the error.

There are many reasons why this could give you an error:
The address 0x6778 might not be part of this process's virtual memory -- it might not really "exist". You could read more about virtual memory, but basically addresses don't refer directly to physical bytes -- they have to be translated in a table, and that table might not have an entry for your address.
If it is mapped, it might be on a read-only page
If it's mapped and writable, it might corrupt some other part of your program, causing a segfault soon after.
In general, you probably can't write to an arbitrary address in a user-level application. Of course, if you're running a kernel or embedded system, ignore this answer, as it totally does not apply ;-)

That address is likely not in your process's address space, so your program receives an exception from the operating system when you try to access it. You shouldn't be trying to use specific memory locations to store things... rather, use malloc for dynamic allocation, or put things on the stack.

int *p=(int*)0x6778;
To do this, the address location 0x6778 should be a valid address location in first place.
An Address space gets allocated to every process, Your program runs in an particular process, If an program tries to access an address location beyond its address space then it will crash. Seems that is happening in your case.
Unless, you are sure that an virtual address location is valid for use by your program DO NOT access the address locations explicitly, let the compiler put the types in address space allocated to your process and return it back to you. To do that, the simplest way is to just make use of local variables with automatic storage or use malloc for dynamic allocations.

Verifying that memory has been initialized in C

I've written an API that requires a context to be initialized and thereafter passed into every API call. The caller allocates the memory for the context, and then passes it to the init function with other parameters that describe how they want later API calls to behave. The context is opaque, so the client can't really muck around in there; it's only intended for the internal use of the API functions.
The problem I'm running into is that callers are allocating the context, but not initializing it. As a result, subsequent API functions are referring to meaningless garbage as if it was a real context.
I'm looking for a way to verify that the context passed into an API function has actually been initialized. I'm not sure if this is possible. Two ideas I've thought of are:
Use a pre-defined constant and store it in a "magic" field of the context to be verified at API invocation time.
Use a checksum of the contents of the context, storing this in the "magic" field and verifying it at invocation time.
Unfortunately I know that either one of these options could result in a false positive verification, either because random crap in memory matches the "magic" number, or because the context happens to occupy the same space as a previously initialized context. I think the latter scenario is more likely.
Does this simply boil down to a question of probability? That I can avoid false positives in most cases, but not all? Is it worth using a system that merely gives me a reasonable probability of accuracy, or would this just make debugging other problems more difficult?

Best solution, I think, is add create()/delete() functions to your API and use create to allocate and initialize the structure. You can put a signature at the start of the structure to verify that the pointer you are passed points to memory allocated with create() and use delete() to overwrite the signature (or entire buffer) before freeing the memory.
You can't actually avoid false positives in C because the caller malloc'd memory that "happened" to start with your signature; but make you signature reasonably long (say 8 bytes) and the odds are low. Taking allocation out of the hands of the caller by providing a create() function will go a long way, though.
And, yeah, your biggest risk is that an initialized buffer is free'd without using delete(), and a subsequent malloc happens to reuse that memory block.

Your context variable is probably at the moment some kind of pointer to allocated memory. Instead of this, make it a token or handle that can be explicitly verified. Every time a context is initialised, you return a new token (not the actual context object) and store that token in an internal list. Then, when a client gives you a context later on, you check it is valid by looking in the list. If it is, the token can then be converted to the actual context and used, otherwise an error is returned.
typedef Context long;
typedef std::map<Context, InternalContext> Contexts;
Contexts _contexts;
Context nextContext()
{
static Context next=0;
return next++;
}
Context initialise()
{
Context c=nextContext();
_contexts.insert(make_pair(c, new InternalContext));
return c;
}
void doSomethingWithContext(Context c)
{
Contexts::iterator it=_ _contexts.find(c);
if (it==_contexts.end())
throw "invalid context";
// otherwise do stuff with the valid context variable
InternalContext *internalContext=*it.second;
}
With this method, there is no risk of an invalid memory access as you will only correctly use valid context references.

Look at the paper by Matt Bishop on Robust Programming. The use of tickets or tokens (similar to file handles in some respects, but also including a nonce - number used once) allows your library code ensure that the token it is using is valid. In fact, you allocate the data structure on behalf of the user, and pass back to the user a ticket which must be provided for each call to the API you define.
I have some code based closely on that system. The header includes the comments:
/*
** Based on the tickets in qlib.c by Matt Bishop (bishop#ucdavis.edu) in
** Robust Programming. Google terms: Bishop Robust Nonce.
** http://nob.cs.ucdavis.edu/~bishop/secprog/robust.pdf
** http://nob.cs.ucdavis.edu/classes/ecs153-1998-04/robust.html
*/
I also built an arena-based memory allocation system using tickets to identify different arenas.

You could define a new API call that takes uninitialised memory and initialises it in whatever way you need. Then, part of the client API is that the client must call the context initialisation function, otherwise undefined behaviour will result.

To sidestep the issue of a memory location of a previous context being reused, you could, in addition to freeing the context, reset it and remove the "magic" number, assuming of course that the user frees the context using your API. That way when the system returns that same block of memory for the next context request, the magic number check will fail.

see what your system does with uninitialzed menmory. m$ does: Uninitialized memory blocks in VC++

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight