What can cause Program Counter to have an invalid address? - c

I am getting an exception "Invalid Program Counter Address" in Vxworks + PPC 603.
Application is linking to multiple 'C' libraries. Am not able to place, what could cause this problem?
Is there a possibility that incorrect compilation options could be causing this?
Any directions or pointers will be helpful.
Thanks
UPDATE:
I am having a structure whose members are function pointers. The structure itself is static and it's address is passed around and through the structure different functions are being invoked.
During one of the test rounds, I found that in the function pointer, the function address value is reduced by 1. If the function address is 0x009a3730, the PC is having 0x00913729.
Also, if I change the compiler options, the place of crash or the number of runs after which the crash happens changes.

Any case where you're working with function pointers can easily lead to this, if the pointer value gets corrupted and later is called. Check signal handlers if any, and any other API:s that deal with callbacks.

"If the function address is 0x009a3730, the PC is having 0x00913729". The difference here is not 1 :) However PC will always point to the address of the next instruction it has to execute AFAIK.
Maybe you could run the core dump in a debugger and print out the :
Back trace
'disassemble' code around the region of the crash
info registers -> register values at the time of the crash
info locals --> local variables of the function inside which it crashed

#All, Thanks for your suggestions.
It turned out that the location containing the address was incorrectly getting pointed to a reference member of another structure and that reference member was getting decremented by one in each call to free that structure.
The memory for that structure should have been allocated by a call to one of our functions. But, instead it was left to refer to some garbade memory without any initalization or memory allocation and it ended up referring to this static memory where the global structure is stored. This led to the static structure getting corrupted and which inturn led to the crash.
A thorough line-by-line analysis of our logs helped in putting all pieces together.

Related

segmentation fault core dump after program runs and displays output

I have made some changes to my program that I started with: But for another reason I am getting a segmentation error. It happens after my output and I think it may have to do with my free statement in the destroy function. I ran it through gdb and it told me I was trying to access a 0X000000d memory location which is weird because I can print out the memory location of my struct and it shows something different. I know I have probably missed something very small. any help would be greatly appreciated thanks!
had to take m code down since it is an on going project in school thanks for the replies I will post it back on once we have a grade.
You have undefined behavior in your code.
Take this line:
struct Person *UserOne=inputvalues(UserOne);
Here you define a variable UserOne and initialize it by calling the inputvalues function, for which you pass the uninitialized pointer. That means inside the inputvalues function, the temp pointer is uninitialized, and its value is indeterminate leading to said UB when you dereference the pointer.
One possible solution is to define a structure variable that is not a pointer, and use it when calling inputvalues, or by dynamically allocating a structure and pass to the function. Or to redesign the program to not pass an argument to the function at all, and let the function itself allocate the structure.
Using uninitialized variables like this is easily detectable by compilers, and most can issue warnings for it. If you don't get such a warning you might want to consider enabling more warnings.

AVR32 exception: Bus Data Error

Recently, I am facing a - to me - strange behavior in my embedded software.
What I got: Running a 32 bit AVR32 controller, starting the program from an external SDRAM, as the file size is too big to start it directly from the micro-controller flash. Due to the physical memory map, the memory areas are split between:
stack (start at 0x1000, length of 0xF000) ( < 0x1000 is protected by the MPU)
EBI SDRAM (start at 0xD0000000, length of 0x00400000).
What happens: Unfortunately I got an exception, which is not reproducible. Looking at my given stack trace, the following event irregular occurs:
Name: Bus error data fetch - Event source: Data bus - Stored Return Address: First non-completed instruction
Additionally, the stack pointer has a valid value, whereas the address where the exception occurs (last entry point for fetching instructions), points into the memory nirvana (e.g. 0x496e6372, something around 0x5..., 0x6....). I guess, this has to be the "First non-completed instruction", the manual is talking about. However, the line in my source code is always the same: accessing a member function from a data array via pointer.
if(mSomeArray[i])
{
mSomeArray[i]->someFunction(); <-- Crash
}
The thing is: adding or deleting other source code makes the event disappear and return again.
What I thought about: Something is corrupting my memory (mapping). What kinds of errors are possible for this?
A buffer overflow?
The SDRAM controller could be turned off, so it loses some data. That is not impossible, but rather improbably
The stack is big enough, I already checked this with a watermark
The Data Bus Rate and AVR clock are set correctly
How to solve this: More assert? Unfortunately I cannot debug this with AVRStudio. Anyone a hint or idea? Or am I missing something obvious?
Edit:
Mentioned approaches from users:
Check for addresses of function pointer and array entries
Overwrite of stack array
Not properly written interrupts
Not initialized pointers
Check for array access via i at crash case
use exception handler address for illegal memory access
use snprintf instead of sprintf
Late appendix to the thread: the issue was a wrong array access (wrong index was set) in an old software module, that had nothing to do with my modules. I found this by accident, it was a curiosity that it didn't appear earlier and it took me quite a while to find the line of code. I mark the only given answer as correct solution.
Thank you all for your input.
Take care (of your software ;))
Here are some ideas:
Check 'i' to make sure it is within the array bounds.
Check the address of the function pointer that is about to be called. It should have an address within the SDRAM.
See if the chip has an exception handler address it will jump to when it accesses illegal memory. Once you are there, output some debug data
If your debugger allows, set a breakpoint on someFunction() when it is written. This would catch some other function when it overwrites the function pointer.

Function calls, the stack

So I don't know why but I learned that when you call a function and pass an argument to it, it deals with it on the stack(processor?).
Can someone please explain it?
then how does it change values of variables, blocks of memory and so on?
There is no guarantee that parameters are passed on the stack, it's architecture and compiler dependent.
As to how values and memory get changed -- when you call a function that must make changes that are seen by the caller, it's normal that what you provide is not the actual value, but rather the address of (pointer to) that value. As long as the function knows the proper memory location it can make these changes.
Stack is used in most cases to pass arguments to function. The reason for using it is that you are not bound to fixed memory places (for arguments) to have your function functional. If you had function that could take arguments from fixed memory you would probably only be able to run it if the memory was free and you would be able to run just one instance of it. Stack gives you the possibility to store your arguments to current context of your program at any time. On x86 processors there is register that points to end of the stack and other register that points to the begining. Those are actualy just addresses to main memory where you want your stack to reside.
There is PUSH instruction that moves the stack-end register to the next place and stores specified data (could be value from other register or at some address or direct value) to address pointed by stack-end resgister. The other instruction is POP and it works the same just the other way around. This way, if you stick to the plan and keep track of what you pushed to stack, you can have your functions work from any context.
There are some other less used options to pass arguments like via registers, which are used for example by bios interrupts. If you want to know more about this I suggest you read something on "Calling conventions".
Lets start with this suppose you have a function
int foo(int value) {
int a = 10;
return a;
}
So whenever a function call is made OS needs some memory space to allocate the local variables of the function int a in this case and arguments to the function passed int value in this case. This memory requirement is fulfilled by allocating memory on stack. A stack is nothing but a memory region allocated to each process and it actually behaves as a stack data structure(LIFO).
Now the question arises what all things are stored on stack when a function call is made. The first thing pushed on the stack are the arguments passed to the function in reverse order(if more then one).
2. Then the return address of the function which called this function (because once this function foo completes execution it should return back to the place in the code from where it was called)
3. Finally local variables of the function called are pushed on the stack.
Once the called function completes executing the code it returns back to the return address previously stored on the stack and thus we say function call completes or returns.
In this case the function has a return value which it passes back to the callee function.
The space is then free to use and can be overwritten in the subsequent function calls.
(Now if you connect the dotes you can realize why local variables(automatic variables) in a function have scope limited to the life of the function call (you asked a SO question related to scope which was closed) because once a call returns the memory space allocated for these locale variable is gone(it is still there but you cant access them once a function returns) so life of these automatic variable int a in this case limits till foo() returns to the callee function.
Side Note:: I have read many questions that you have posted in SO. I guess you are trying to learn C and basic working of the underlying hardware and OS in general and the confusion in between them is killing you.
I would suggest you some pointers apart from the answer to this question to read and understand which will give you lots of insight into the questions you are facing.
For C refer K&R it is the best book.
In the starting read little bit about OS concepts(Memory handling, Virtual Memory in particular)
Try imagine the working of a system in broad sense as in how different components are interacting.
Some good links for understanding memory related stuff and system internals http://duartes.org/gustavo/blog/best-of
and if you want to dive into stack space for a function call try this link http://www.binarypirates.in/2011/02/17/understanding-function-stack-in-c/
Hope this helps

Is there a way to test for an invalid memory location?

In a language like C, for example, if a routine receives a pointer, is there any system call or other test that can be applied to the pointer to check that it is a valid memory location, other than catching SIGSEGV or equivalent?
No, you can't for sure check whether the address is invalid. Even if you used some operating system function to test if teh address is mapped into the address space you still can't be sure if the address is of some service data that you should not read or modify.
One good example. If your program uses Microsoft RPC to accept calls from another program you have to implement a set of callback functions to server the requests. Those callback functions will be run on separated threads started by RPC. You don't know when those thereads start and what their stack size is, so you can't detect whether a buffer overrun occurs if you write through an address that is meant to be of a stack variable but accidentially is to the stack of another thread.
Well, if you knew where the memory being pointed to was being stored (on the stack, for instance), you could check to see if it's in a certain 'range' that is the approximate address range of the stack. That could also work for something on the heap, if you have an idea of how big your heap "should" be. It's definitely not a fail-safe approach, but I'm unaware of any sure-fire methods for checking the 'validity' of a pointer.
If you mean purely within your own application you can establish a convention that any memory allocated by your code is initialized in a way you can recognize. E.g. in one project I saw they wrote an eyecatcher in the first few bytes. In some products I know they write a unique id at the start and end and each time it's accessed they check the 2 ids still match to show it's not been corrupted. E.g CICS on z/Series does the latter.

What is "null pointer assignment error"?

One of job interview questions on C pointers here is the following: what is null pointer assignment error?
I've googled for a while and don't see any reasonable explanation. What is that? Trying to write through a null pointer? Something architecture- or environment-specific? What exactly is that error?
http://www.faqs.org/qa/qa-3786.html
A NULL pointer assignment is a runtime error
It occurs due to various reasons one is that your program has tried to access an illegal memory location.
Illegal location means either the location is in the operating systems address space or in the other processes memory space.
In stdio.h NULL is defined as 0
So whenever your program tries to access 0th location the operating system kills your program with runtime assignment error because the 0th location is in the operating systems address space and operating system doesn't allow access to its address space by user program .
Example code:
int* ptr = NULL;
*ptr = 3;
Explanation:
On almost every system, address 0 is reserved. System won't allow you to write to that location. If you try, you will get a runtime exception (access violation, segmentation fault, etc.).
I actually can not recall the source, but according to the source, this run time error is restricted to small and medium memory models being put into use by corresponding compiler. You see, as told before, the null pointer actually does not points to zero, in fact different compilers use different but fixed memory location to be used as null pointer.
Lets consider the case of TC compiler, this compiler places four zero bytes at the bottom of the data segment and TC copyright notice. TC also uses DS:0000 location, bottom of the data segment as null pointers location. So, assigning a value to this null pointer, would actully change the four bytes and probably, mess up the copyright notice.
Now, at the program termination, the four zeros and copyright banner are checked for any kind of alteration. If any alterations are found, it generates a Null Pointer Assignment error.
So, I think its not just the null pointer, any pointer that gets wild, if tries to access some key areas, you are greeted with Null Pointer Assignment Error.
There are many scenarios where you can see problems. But the key thing is, you did not allocate the memory correctly. The following code would produce Null pointer assignment error message after you run the program. Note: It will compile correctly.
void CopyMessage(char *p)
{
strcpy(p, "welcome");
}
void main()
{
char *src;
CopyMessage(src);
}
It is a run time error occurs when you try to point illegal memory space, usually address 0 which is reserved for OS.
My intent in this answer is supplemental to the most basic of concepts of a null pointer. This simplest definition, as listed in many places is whenever a base value pointing to an address gets assigned a '0' or NULL value because the zero page, 0th memory location is part of the operating systems address space and operating system doesn't allow access to its address space by the user's program. In such cases the pre-compiler or compiler may generate an error, or the error can be generated by the operating system itself during run time as a memory access violation.
The following discussion of null pointers is based on concepts contained in the programming that occurs at the machine level of a language that allows fine control and requires understanding of how variable space is addressed.
Most high level languages and compilers may prevent this from occurring with appropriate 'type' casting, and specifying an option base and making no mental miscalculations in indexing. 'C' as a language, without specifying the strictest of compiler parameters, is particularly prone to these types of errors, as well as less sophisticated machine based compiler or programming languages as found today in "pocket' processors.
However, since the inception of computers and programming languages the concept of a zero pointer error expanded to include any pointer that points to the 0th location in any protected memory location. But especially within the context of how a memory location that can be used to point to any memory location can be unintentionally overwritten to contain a null value. And here I examine this concept because of what I've called 'the error of 1' which occurs when programmers have to switch between option base '0' or option base '1'. This is a counting issue where we begin our count with '0' or '1' as in:
Option Base 0
[0,1,2,..,9] or
Option Base 1
[1,2,3,...,10]
for an array with 10 elements. An error of 1 can create a miscalculation which results in a pointer to the first memory location 'before' the array,
Option Base 1
0[1,2,3,...,10]
^
|Last memory location of another variable space 'before' this variable space
or the first position 'after' the array which by definition is out of bounds.
Option Base 0
[0,1,2,...,9]10
^
|First memory location of another variable after this variable
But, when dealing with programming that uses direct memory access, as in the original machine code of any language, an error of 1 can be tragic placing an unintended value a memory location outside of the intended range which in the case of variable space and using pointers is the variable space before or after the intended variable, which when the array is initialized or cleared creates a 'null' or 0 in an undesired location, and especially if it's an array of pointers a null pointer error in an unintended variable.
This is of course dependent upon how the variable space is structured and/or what type. Can be particularly troublesome if the variable or other storage address space is nested in the code. As I state earlier, many high level language compilers can circumvent most of this type of error; but, when writing specific subroutines in machine code, for whatever reason deemed necessary, one must take extra care to ensure that option base is explicitly defined and adhered to by practice if not compiler convention.
First, programmers recognize the necessity that both the program and the storage areas are clearly defined and that nothing, without express consent, should modify even a single bit of data. This is critical, with respect to a null pointer because the 0th memory location, in zero page area of the operating system is often used to store a stack which are memory locations pushed onto the stack for a return operation. Whether a system call pushes an address for a return operation (popping the return address from where the system was interrupted) due to a mask-able or non-mask-able interrupts, or because the programmer wants to push data, or a memory location to be later popped off of this stack. This is a protected area. Like any pointer to a valid memory address, one would not want to write to the wrong location, the 0th location is particularly susceptible if overwritten because variables are often null or have a value of 0 from an initial power up state, and thus a variable which has not been explicitly defined after power up, or has been intentionally initialized is likely to be zero.
In the case of the stack on zero page, or any stack containing return address, if values are pushed onto the stack, and not popped before a 'return' is encountered, the return value can be null or zero and the return points to the stack's memory location. This is a null pointer error, which may not generate an error but return the code pointer to an area that does not contain code, such as the middle of a stack. These exploits are well known and often used in methods to compromise a system's security to gain access by less scrupulous crackers; or can be used for ingenious access under special circumstances, or when accidental create all kinds of mischief where the source is difficult to determine.
As I stated, this description is outside the conventional definition of a null pointer error, but it can produce a null pointer nonetheless, though more often producing other errors, or none at all. It often gives no indication of its existence other than 'if or when' a program fails to perform as expected.
Here I provide additional non-conventional examples and definitions of potential sources of null pointer assignment errors, rather than defining conventional understanding which is more an error in programming convention than an error in programming logic.
This type of error (undefined, or null) is much rarer. But modern programming of 'pocket' processors, using bench top devices like an Arduino, Raspberry PI, AMD, or any other computer on a chip programming exists in a myriad of forms, many of which are as simple today as yesteryear, this null pointer problem still exists today and can occur even in the most sophisticated of systems. Additionally, companies that build their own variable or data structures are probably also the most likely people to see this type error nowadays. The intention is to show examples that can aid in recognition.
As was defined in older days, it was soon recognized that conditions which produce null pointer errors could also produce errors where the value of the pointer was unintentionally modified. Then, as a variable which is being used as a pointer and having been overwritten without the programmer's intent or knowledge, which could be a null, but could also be any other value. So, we find that an issue which can create a null pointer, can also create a non null pointer. The null pointer is a special case, which OFTEN creates an systematic error message; but, when those same conditions cause the pointer to take on a random or undetermined value instead of the original address where the data should reside, it now contains a null or unknown address which results in moving or storing data to an invalid or unwanted location potentially overwriting and corrupting that code or data.
Most will rightfully argue that this is NOT a null pointer error; and, they are completely 100% CORRECT! However, the roots of this error typically create strange often seen null pointer errors because more often the pointers will contain a 'null'! The point of this exercise in definition is to point out how the creation of null pointers can also lead to a problem which appears to have no indication of the original source of the problem. IOW, in concept there is no pointer to the problem. Because of the relationship to the creation of odd 'null' pointer issues, and in this case the subsequent lack of data that points to the source of the error because the pointer was NOT null and instead is 'undefined' old timers who have transitioned from top-down programming to object oriented event driven programming recognize this relationship and recognize this type of 'null' pointing error which appears to have no definable source.
Because this type failure, the corrupted data, or corrupted code may not immediately execute or get used at the time it is moved to an existing unused memory location. However, when the code or data does cause a problem at a later time in the run, there is no information on the 'real' location of the error because it is so far removed in time from the event that caused it. Whether it creates or assigns null pointers or creates some other corruption, it modifies code, and things can get weird, really weird.
To summarize, I define a null pointer as any null or undefined address used to point to a memory location regardless of what originally creates it. A null pointer assignment error, or many other errors, can be assigned to this issue and example.
In simpler architecture or programming environments, It can refer to any code which unintentionally ends up creating nulls as pointers, or creates a bug that in anyway halts the execution, like overwriting a byte in the return stack, overwriting code, code that accidentally stores a '0' in a bad location, existing code, or just as data in the wrong location, not just as an address.
So, while the examples above, work fine to define an example of a null pointer. So we expand the concept, A null pointer is any pointer which gets used as a variable pointer, and the address location of that variable for any one of multiple reasons now contains a 'null' or ANY unintended value that causes it to point to an undesirable memory location no matter how it got there, not just errors in programming logic or math calculation errors. IOW, not just a 0 in a pointer; more specifically a zero or undefined value in any memory location where that memory location was not the specific target and under other circumstances had an OTHER purpose for which it will now perform!
So, finally, one can get a null pointer error, and upon examining the pointer finds it contains a null; but, cannot find the code that placed the null into the pointer or assigned it. This is a broadest definition of null pointer assignment error, and is absolutely the worst case scenario of a null pointer error. When this occurs in a large program, it often results in the death of the program because if the error existed in previous versions, but was writing to unintended memory locations (which allowed the program to function or IOW) which was previously accessible but unallocated in earlier versions, the error goes unnoticed until the program gets expanded, and now that once previously unused memory location contains new code OR data which allows the old bug to now generate random errors in the new code or corrupts data!
For example: in an earlier version, the bad address value causes data to be written outside of the defined variable spaces, but goes unnoticed for several versions because the data is being written, and it being read, and the program and everything 'appears' OK! But, as the program expands, new code now exists relatively in the same relative address space as the memory where the original old bug had been incorrectly writing to the wrong memory location and no one noticed, whether one byte, or a whole block of data! But, now, there exists new code there. And when the program runs that particular code, today or tomorrow as whatever function that contains it is called, the new data gets corrupted by the old undiscovered bug.
Finding the 'original' error which existed a year earlier is now almost, if not completely, impossible to find.
Administrator and developer logic usually dictates, why should I look there, we know that code ran and worked just fine for that last several versions. But, now, part of the new code does not work, a major pieces are broken. We look and look and nothing does one find. It's as if the error doesn't exist, and yet it does. What's causing it, who suspects code written years earlier? Null pointers and many other errors are caused by this too. With understanding, and a good editor that can examine code directly, appropriate monitors to watch modified memory locations that trigger a halt to determine the code being executed at right time, even this can be found.

Resources