C function scope array persists between calls - c

I'm writing an assembler for a language I'm making up as I go along, and I'm parsing labels.
Labels start with an octothorpe, and end with a whitespace character, so as I parse, if I encounter an #, I call my make_label function.
make_label looks something like:
uint32_t make_label(FILE f) {
uint8_t i=0;
char c;
char buffer[64];
while ( (c = fgetc(f)) != ' ') {
buffer[i++] = c;
}
// Do thing with label
return 1
}
There's a bit more to it, but that's the general gist. There's a bug as written, which uncovered some weird behaviour I don't quite understand.
I forgot to '\0' terminate the buffer. When I examined the labels, given an input like:
#start
...
#loop
...
#ignore
...
#end
...
I would see the labels:
start
loopt
ignore
endore
The buffer variable was keeping its value between calls.
I'm not sure it really matters, because I realise I should have been adding the null terminator, but I was curious as to why this is happening? I was using printf to output the buffer, and it didn't seem to care that there was no terminator, which is why I didn't notice immediately.
Is this all just dumb luck? The array as declared just happened to be zeroed, and each call just happened to allocate the same block on the stack?

Is this all just dumb luck? The array as declared just happened to be zeroed, and each call just happened to allocate the same block on the stack?
Yep, seems like it!
To address both parts of the question:
The array as declared just happened to be zeroed
This is not so surprising. According to my vague memories of contemporary operating system design, backed up by these other stack overflow answers in Kernel zeroes memory?, all memory in a page will be zeroed to begin with, for security reasons. So if you haven't touched that part of the stack before, it will probably be 0. (Do not rely on this.)
and each call just happened to allocate the same block on the stack
This is not so surprising. Each call to this function allocates the same size block on the stack every time. Furthermore, in your example it seems like every time you call the function you aren't in the middle of parsing anything else, which implies you aren't in the middle of calling any other functions, so there's nothing else on the stack that would add an offset to this call, and thus the block is always allocated in the same place.
That's just my intuition about what's happening; you can experiment to see if it matches reality.

Related

why doesn´t char newWord[45]; have "clean" values at the start of a function like char newWord[45] = ""; does?

I´m a bit confused with these 2.
I have a function called check that does the following:
bool check(const char *word)
{
char newWord[LENGTH + 1] = "";
for (int i = 0; word[i]; i++)
{
newWord[i] = tolower(word[i]);
}
}
Now for example if I use ="", the variable newWord will have all of it´s values as '\0' anytime I run the function check();
But when using char newWord[LENGTH + 1]; the variable seems to keep the old values even after my functions has returned, so when I do check() again, the char newWord already has values from the previous time I ran that function.
I know this is related to pointers and memory allocation but I just cannot seem to get how this works.
It's not a fancy answer, but compilers (and versions of compilers) have different opinions on whether to initialize memory before you use it. Unless things have changed recently, the only variables that get automatically initialized are variables with a static scope (global variables and those explicitly marked static).
For everything else, certain compilers might set everything to zeroes (or another value for debugging), but most won't add that small overhead to your program, when you're probably just going to assign a value of your own, soon enough. One of the biggest debugging effort of my career was because we changed C compilers from one that didn't pre-initialize variables (like most do) to one that did, exposing a bunch of errors my predecessors didn't catch (assigning to the wrong variable, looking for non-zero values and suddenly finding them), so it's an important feature to know about.
The key to your question, though is "seems to keep," because it's only an accident that the old string is still in the right position. If you call another function between check() calls, you'll start to see different scratch memory.
Moral of the story? Always initialize every one of your variables, unless you absolutely know that it's going to get a value before you use it.
The values that you find after running check() twice are still garbage values. When you allocate some memory using
char newWord[LENGTH + 1];
You are always getting "some" memory that the operating system handles you (in laymen-terms), having initially garbage values. It is just a coincidence that you're getting the same memory blocks that you got from the previous call to check().
However, when you do:
char newWord[LENGTH + 1] = "";
You are explicitly initializing those memory blocks to \0.
If you don't initialize a local variable, it has an indeterminate value. Which could be anything, including random "garbage" or left-overs sitting in RAM memory since previous execution. There are no guarantees of what value you will get - and that's it.
Accessing such an uninitialized variable's value might also invoke undefined behavior in some cases:
(Why) is using an uninitialized variable undefined behavior?

C: dynamic char-array crashes heap

I have yet again a question about the workings of C. (ANSI-C compiled by VS2012)
I am refactoring a standalone program (.exe) into a .dll. This works fine so far but I stumble accross problems when it comes to logging. Let me explain:
The original program - while running - wrote a log-file and printed information to the screen. Since my dll is going to run on a webserver, accessed by many people simultaneously there is
no real chance to handle log-files properly (and clean up after them)
no console-window anyone would see
So my goal is to write everything that would be put in the log-file or on the screen into string-like variables (I know that there are no strings in C) which I then can later pass on requet to the caller (also a dll, but written in C#).
Since in C such a thing is not possible:
char z88rlog;
z88rlog="First log-entry\n";
z88rlog+="Second log-entry\n";
I have two possibilities:
char z88rlog[REALLY_HUGE];
dynamically allocating memory
In my mind the first way is to be ignored because:
The potential waste of memory is rather enormous
I still may need more memory than REALLY_HUGE, thus creating a buffer overflow
which leaves me with the second way. I have done some work on that and came up with two solutions, either of which doesn't work properly.
/* Solution 1 */
void logpr(char* tmpstr)
{
extern char *z88rlog;
if (z88rlog==NULL)
{
z88rlog=malloc(strlen(tmpstr)+1);
strcpy(z88rlog,tmpstr);
}
else
{
z88rlog=realloc(z88rlog,strlen(z88rlog)+strlen(tmpstr));
z88rlog=strcat(z88rlog,tmpstr);
}
}
In solution 1 (equal to solution 2 you will find) I pass my new log-entry through char tmpstr[255];. My "log-file" z88rlog is declared globally, so I need extern to access it. I then check if memory has been allocated for z88rlog. If no I allocate memory the size of my log-entry (+1 for my \0) and copy the contents of tmpstr into z88rlog. If yes I realloc memory for z88rlog in the size of what it has been + the length of tmpstr (+1). Then the two "string" are joined, using strcat. Using breakpoints an the direct-window I obtainded the following output:
z88rlog
0x00000000 <Schlechtes Ptr>
z88rlog
0x0059ef80 "start Z88R version 14OS"
z88rlog
0x0059ef80 "start Z88R version 14OS
opening file Z88.DYNÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍýýýý««««««««þîþîþîþ"
It shows three consecutive calls of logpr (breakpoint before strcpy/strcat). The indistinguable gibberish at the end results from memory allocation. After that VS gives out an error message that something caused the debugger to set a breakpoint in realloc.c. Because this obviously doesn't work I concocted my wonderful solution 2:
/* Solution 2 */
void logpr(char* tmpstr)
{
extern char *z88rlog;
char *z88rlogtmp;
if (z88rlog==NULL)
{
z88rlog=malloc(strlen(tmpstr)+1);
strcpy(z88rlog,tmpstr);
}
else
{
z88rlogtmp=malloc(strlen(z88rlog)+strlen(tmpstr+1));
z88rlogtmp=strcat(z88rlog,tmpstr);
free(z88rlog);
z88rlog=malloc(strlen(z88rlogtmp)+1);
memcpy(z88rlog,z88rlogtmp,strlen(z88rlogtmp)+1);
free(z88rlogtmp);
}
}
Here my aim is to create a copy of my log-file, free the originals' memory create new memory for the original in the new size and copy the contents back. And don't forget to free the temporary copy since it's allocated via malloc. This crashes instantly when it reaches free, again telling me that the heap might be broken.
So lets comment free for the time being. This does work better - much to my relief - but while building the log-string suddenly not all characters from z88rlogtmp get copied. But everything still works kind of properly. Until suddenly I am told again that the heap might be broken and the debugger puts a breakpoint at the end of _heap_alloc (size_t size) in malloc.c size has - according to the debugger - the value of 1041.
So I have 2 (or 3) ways I want to achieve this "string-growing" but none works. Might the error giving me the size point me to the conclusion that the array has become to big? I hope I explained well what I want to do and someone can help me :-) Thanks in advance!
irony on Maybee I should just go and buy some new heap for the computer. Does it fit in RAM-slots? Can anyone recomend a good brand? irony off
This is one mistake in Solution 1:
z88rlog=realloc(z88rlog,strlen(z88rlog)+strlen(tmpstr));
as no space is allocated for the terminating null character. Note that you must store the result of realloc() to a temporary variable to avoid memory leak in the event of failure. To correct:
char* tmp = realloc(z88rlog, strlen(z88rlog) + strlen(tmpstr) + 1);
if (tmp)
{
z88rlog = tmp;
/* ... */
}
Mistakes in Solution 2:
z88rlogtmp=malloc(strlen(z88rlog)+strlen(tmpstr+1));
/*^^^^^^^^^*/
it is calulating one less than the length of tmpstr. To correct:
z88rlogtmp=malloc(strlen(z88rlog) + strlen(tmpstr) + 1);
Pointer reassignment resulting in undefined behaviour:
z88rlogtmp=strcat(z88rlog,tmpstr);
/* Now, 'z88rlogtmp' and 'z88rlog' point to the same memory. */
free(z88rlog);
/* 'z88rlogtmp' now points to deallocated memory. */
z88rlog=malloc(strlen(z88rlogtmp)+1);
/* This call ^^^^^^^^^^^^^^^^^^ is undefined behaviour,
and from this point on anything can happen. */
memcpy(z88rlog,z88rlogtmp,strlen(z88rlogtmp)+1);
free(z88rlogtmp);
Additionally, if the code is executing within a Web Server it is almost certainly operating in a multi-threaded environment. As you have a global variable it will need synchronized access.
You seem to have many problems. To start with in your realloc call you don't allocate space for the terminating '\0' character. In your second solution you have strlen(tmpstr+1) which isn't correct. In your second solution you also use strcat to append to the existing buffer z88rlog, and if it's not big enough you overwrite unallocated memory, or over data allocated for something else. The first argument to strcat is the destination, and that is what is returned by the function as well so you loose the newly allocated memory too.
The first solution, with realloc, should work fine, if you just remember to allocate that extra character.
In solution 1, you would need to allocate space for terminating NULL character. Hence, the realloc should include one more space i.e.
z88rlog=realloc(z88rlog,strlen(z88rlog)+strlen(tmpstr) + 1);
In second solution, I am not sure of this z88rlogtmp=strcat(z88rlog,tmpstr); because z88rlog is the destination string. In case you wish to perform malloc only, then
z88rlogtmp=malloc(strlen(z88rlog)+1 // Allocate a temporary string
strcpy(z88rlogtmp,z88rlog); // Make a copy
free(z88rlog); // Free current string
z88rlog=malloc(strlen(z88rlogtmp)+ strlen(tmpstr) + 1)); //Re-allocate memory
strcpy(z88rlog, z88rlogtmp); // Copy first string
strcat(z88rlog, tmpStr); // Concatenate the next string
free(z88rlogtmp); // Free the Temporary string

How strcpy works behind the scenes?

This may be a very basic question for some. I was trying to understand how strcpy works actually behind the scenes. for example, in this code
#include <stdio.h>
#include <string.h>
int main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", sizeof(s));
return 0;
}
As I am declaring s to be a static array with size less than that of source. I thought it wont print the whole word, but it did print world isnsadsdas .. So, I thought that this strcpy function might be allocating new size if destination is less than the source. But now, when I check sizeof(s), it is still 6, but it is printing out more than that. Hows that working actually?
You've just caused undefined behaviour, so anything can happen. In your case, you're getting lucky and it's not crashing, but you shouldn't rely on that happening. Here's a simplified strcpy implementation (but it's not too far off from many real ones):
char *strcpy(char *d, const char *s)
{
char *saved = d;
while (*s)
{
*d++ = *s++;
}
*d = 0;
return saved;
}
sizeof is just returning you the size of your array from compile time. If you use strlen, I think you'll see what you expect. But as I mentioned above, relying on undefined behaviour is a bad idea.
http://natashenka.ca/wp-content/uploads/2014/01/strcpy8x11.png
strcpy is considered dangerous for reasons like the one you are demonstrating. The two buffers you created are local variables stored in the stack frame of the function. Here is roughly what the stack frame looks like:
http://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Call_stack_layout.svg/342px-Call_stack_layout.svg.png
FYI things are put on top of the stack meaning it grows backwards through memory (This does not mean the variables in memory are read backwards, just that newer ones are put 'behind' older ones). So that means if you write far enough into the locals section of your function's stack frame, you will write forward over every other stack variable after the variable you are copying to and break into other sections, and eventually overwrite the return pointer. The result is that if you are clever, you have full control of where the function returns. You could make it do anything really, but it isn't YOU that is the concern.
As you seem to know by making your first buffer 6 chars long for a 5 character string, C strings end in a null byte \x00. The strcpy function copies bytes until the source byte is 0, but it does not check that the destination is that long, which is why it can copy over the boundary of the array. This is also why your print is reading the buffer past its size, it reads till \x00. Interestingly, the strcpy may have written into the data of s depending on the order the compiler gave it in the stack, so a fun exercise could be to also print a and see if you get something like 'snsadsdas', but I can't be sure what it would look like even if it is polluting s because there are sometimes bytes in between the stack entries for various reasons).
If this buffer holds say, a password to check in code with a hashing function, and you copy it to a buffer in the stack from wherever you get it (a network packet if a server, or a text box, etc) you very well may copy more data from the source than the destination buffer can hold and give return control of your program to whatever user was able to send a packet to you or try a password. They just have to type the right number of characters, and then the correct characters that represent an address to somewhere in ram to jump to.
You can use strcpy if you check the bounds and maybe trim the source string, but it is considered bad practice. There are more modern functions that take a max length like http://www.cplusplus.com/reference/cstring/strncpy/
Oh and lastly, this is all called a buffer overflow. Some compilers add a nice little blob of bytes randomly chosen by the OS before and after every stack entry. After every copy the OS checks these bytes against its copy and terminates the program if they differ. This solves a lot of security problems, but it is still possible to copy bytes far enough into the stack to overwrite the pointer to the function to handle what happens when those bytes have been changed thus letting you do the same thing. It just becomes a lot harder to do right.
In C there is no bounds checking of arrays, its a trade off in order to have better performance at the risk of shooting yourself in the foot.
strcpy() doesn't care whether the target buffer is big enough so copying too many bytes will cause undefined behavior.
that is one of the reasons that a new version of strcpy were introduced where you can specify the target buffer size strcpy_s()
Note that sizeof(s) is determined at run time. Use strlen() to find the number of characters s occupied. When you perform strcpy() source string will be replaced by destination string so your output wont be "Helloworld isnsadsdas"
#include <stdio.h>
#include <string.h>
main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", strlen(s));
}
You are relying on undefined behaviour in as much as that the compiler has chose to place the two arrays where your code happens to work. This may not work in future.
As to the sizeof operator, this is figured out at compile time.
Once you use adequate array sizes you need to use strlen to fetch the length of the strings.
The best way to understand how strcpy works behind the scene is...reading its source code!
You can read the source for GLibC : http://fossies.org/dox/glibc-2.17/strcpy_8c_source.html . I hope it helps!
At the end of every string/character array there is a null terminator character '\0' which marks the end of the string/character array.
strcpy() preforms its task until it sees the '\0' character.
printf() also preforms its task until it sees the '\0' character.
sizeof() on the other hand is not interested in the content of the array, only its allocated size (how big it is supposed to be), thus not taking into consideration where the string/character array actually ends (how big it actually is).
As opposed to sizeof(), there is strlen() that is interested in how long the string actually is (not how long it was supposed to be) and thus counts the number of characters until it reaches the end ('\0' character) where it stops (it doesn't include the '\0' character).
Better Solution is
char *strcpy(char *p,char const *q)
{
char *saved=p;
while(*p++=*q++);
return saved;
}

Malloc has junk for C string?

I'm new to C, so feel free to correct mistakes.
I have some code that somewhat goes like this:
// some variables declared here like int array_size
char* cmd = (char*)malloc(array_size*sizeof(char));
for(;;){
// code here sets cmd to some string
free(cmd);
array_size = 10;
cmd = (char*)malloc(array_size*sizeof(char));
// print 1
printf(cmd);
printf("%d\n", strlen(cmd));
// repeat above for some time and then break
}
So I do the loop for a while and see what it prints. What I expected was every time the string would be empty and the length would be 0. However, that is not the case. Apparently sometimes malloc gets memory with junk and prints that out and that memory with junk has a length != 0. So I was thinking about solving this by setting all char in a new char string to '\0' when malloc returns; however, I'm pretty sure I just did something wrong. Why is it even after I free the string and do a whole new malloc that my string comes with junk unlike the first malloc? What am I doing wrong?
malloc just allocated the memory and nothing more. It has no promises about what is in the memory. Specifically, it does not initialize memory. If you want allocated memory to be zeroed out, you can either do it manually with memset or simply call calloc (which is essentially malloc with zeroing out of memory).
malloc does not initialise the memory. You are just lucky the first time around.
Also if it is junk and contains a % symbol you are going to have other problems.
No you did nothing wrong - malloc does not guarantee the memory will be set to 0, only that it belongs to your process.
In general setting newly allocated memory to zero in unneeded so in C it is never explicitly cleared which would take several clock cycles.
There is a rather convenient method 'memset' to set it if you need
Your code segment has, at a minimum, the following problems.
You don't ever need to multiply by sizeof(char) - it's always one.
You cast the return value of malloc. This can hide errors that would otherwise be detected, such as if you forget to include the header with the malloc prototype (so it assumes int return code).
malloc is not required to do anything with the memory it gives you, nor will it necessarily give you the same block you just freed. You can initialise it to an empty string with a simple *cmd = '\0'; after every malloc if that's what you need.
printf (cmd) is dangerous if you don't know what cmd contains. If it has a format specifier character (%), you will get into trouble. A better way is printf ("%s", cmd).

How can dereferencing a NULL pointer in C not crash a program?

I need help of a real C guru to analyze a crash in my code. Not for fixing the crash; I can easily fix it, but before doing so I'd like to understand how this crash is even possible, as it seems totally impossible to me.
This crash only happens on a customer machine and I cannot reproduce it locally (so I cannot step through the code using a debugger), as I cannot obtain a copy of this user's database. My company also won't allow me to just change a few lines in the code and make a custom build for this customer (so I cannot add some printf lines and have him run the code again) and of course the customer has a build without debug symbols. In other words, my debbuging abilities are very limited. Nonetheless I could nail down the crash and get some debugging information. However when I look at that information and then at the code I cannot understand how the program flow could ever reach the line in question. The code should have crashed long before getting to that line. I'm totally lost here.
Let's start with the relevant code. It's very little code:
// ... code above skipped, not relevant ...
if (data == NULL) return -1;
information = parseData(data);
if (information == NULL) return -1;
/* Check if name has been correctly \0 terminated */
if (information->kind.name->data[information->kind.name->length] != '\0') {
freeParsedData(information);
return -1;
}
/* Copy the name */
realLength = information->kind.name->length + 1;
*result = malloc(realLength);
if (*result == NULL) {
freeParsedData(information);
return -1;
}
strlcpy(*result, (char *)information->kind.name->data, realLength);
// ... code below skipped, not relevant ...
That's already it. It crashes in strlcpy. I can tell you even how strlcpy is really called at runtime. strlcpy is actually called with the following paramaters:
strlcpy ( 0x341000, 0x0, 0x1 );
Knowing this it is rather obvious why strlcpy crashes. It tries to read one character from a NULL pointer and that will of course crash. And since the last parameter has a value of 1, the original length must have been 0. My code clearly has a bug here, it fails to check for the name data being NULL. I can fix this, no problem.
My question is:
How can this code ever get to the strlcpy in the first place?
Why does this code not crash at the if-statement?
I tried it locally on my machine:
int main (
int argc,
char ** argv
) {
char * nullString = malloc(10);
free(nullString);
nullString = NULL;
if (nullString[0] != '\0') {
printf("Not terminated\n");
exit(1);
}
printf("Can get past the if-clause\n");
char xxx[10];
strlcpy(xxx, nullString, 1);
return 0;
}
This code never gets passed the if statement. It crashes in the if statement and that is definitely expected.
So can anyone think of any reason why the first code can get passed that if-statement without crashing if name->data is really NULL? This is totally mysterious to me. It doesn't seem deterministic.
Important extra information:
The code between the two comments is really complete, nothing has been left out. Further the application is single threaded, so there is no other thread that could unexpectedly alter any memory in the background. The platform where this happens is a PPC CPU (a G4, in case that could play any role). And in case someone wonders about "kind.", this is because "information" contains a "union" named "kind" and name is a struct again (kind is a union, every possible union value is a different type of struct); but this all shouldn't really matter here.
I'm grateful for any idea here. I'm even more grateful if it's not just a theory, but if there is a way I can verify that this theory really holds true for the customer.
Solution
I accepted the right answer already, but just in case anyone finds this question on Google, here's what really happened:
The pointers were pointing to memory, that has already been freed. Freeing memory won't make it all zero or cause the process to give it back to the system at once. So even though the memory has been erroneously freed, it was containing the correct values. The pointer in question is not NULL at the time the "if check" is performed.
After that check I allocate some new memory, calling malloc. Not sure what exactly malloc does here, but every call to malloc or free can have far-reaching consequences to all dynamic memory of the virtual address space of a process. After the malloc call, the pointer is in fact NULL. Somehow malloc (or some system call malloc uses) zeros the already freed memory where the pointer itself is located (not the data it points to, the pointer itself is in dynamic memory). Zeroing that memory, the pointer now has a value of 0x0, which is equal to NULL on my system and when strlcpy is called, it will of course crash.
So the real bug causing this strange behavior was at a completely different location in my code. Never forget: Freed memory keeps it values, but it is beyond your control for how long. To check if your app has a memory bug of accessing already freed memory, just make sure the freed memory is always zeroed before it is freed. In OS X you can do this by setting an environment variable at runtime (no need to recompile anything). Of course this slows down the program quite a bit, but you will catch those bugs much earlier.
First, dereferencing a null pointer is undefined behavior. It can crash, not crash, or set your wallpaper to a picture of SpongeBob Squarepants.
That said, dereferencing a null pointer will usually result in a crash. So your problem is probably memory corruption-related, e.g. from writing past the end of one of your strings. This can cause a delayed-effect crash. I'm particularly suspicious because it's highly unlikely that malloc(1) will fail unless your program is butting up against the end of its available virtual memory, and you would probably notice if that were the case.
Edit: OP pointed out that it isn't result that is null but information->kind.name->data. Here's a potential issue then:
There is no check for whether information->kind.name->data is null. The only check on that is
if (information->kind.name->data[information->kind.name->length] != '\0') {
Let's assume that information->kind.name->data is null, but information->kind.name->length is, say, 100. Then this statement is equivalent to:
if (*(information->kind.name->data + 100) != '\0') {
Which does not dereference NULL but rather dereferences address 100. If this does not crash, and address 100 happens to contain 0, then this test will pass.
It is possible that the structure is located in memory that has been free()'d, or the heap is corrupted. In that case, malloc() could be modifying the memory, thinking that it is free.
You might try running your program under a memory checker. One memory checker that supports Mac OS X is valgrind, although it supports Mac OS X only on Intel, not on PowerPC.
The effect of dereferencing the null pointer is undefined by standard as far as I know.
According to C Standard 6.5.3.2/4:
If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
So there could be crash or could be not.
You may be experiencing stack corruption. The line of code you are refering to may not be being executed at all.
My theory is that information->kind.name->length is a very large value so that information->kind.name->data[information->kind.name->length] is actually referring to a valid memory address.
The act of dereferencing a NULL pointer is undefined by the standard. It is not guaranteed to crash and often times won't unless you actually try and write to the memory.
As an FYI, when I see this line:
if (information->kind.name->data[information->kind.name->length] != '\0') {
I see up to three different pointer dereferences:
information
name
data (if it's a pointer and not a fixed array)
You check information for non-null, but not name and not data. What makes you so sure that they're correct?
I also echo other sentiments here about something else possibly damaging your heap earlier. If you're running on windows, consider using gflags to do things like page allocation, which can be used to detect if you or someone else is writing past the end of a buffer and stepping on your heap.
Saw that you're on a Mac - ignore the gflags comment - it might help someone else who reads this. If you're running on something earlier than OS X, there are a number of handy Macsbugs tools to stress the heap (like the heap scramble command, 'hs').
I'm interested in the char* cast in the call to strlcpy.
Could the type data* be different in size than the char* on your system? If char pointers are smaller you could get a subset of the data pointer which could be NULL.
Example:
int a = 0xffff0000;
short b = (short) a; //b could be 0 if lower bits are used
Edit: Spelling mistakes corrected.
Here's one specific way you can get past the 'data' pointer being NULL in
if (information->kind.name->data[information->kind.name->length] != '\0') {
Say information->kind.name->length is large. Atleast larger than
4096, on a particular platform with a particular compiler (Say, most *nixes with a stock gcc compiler) the code will result in a memory read of "address of kind.name->data + information->kind.name->length].
At a lower level, that read is "read memory at address (0 + 8653)" (or whatever the length was).
It's common on *nixes to mark the first page in the address space as "not accessible", meaning dereferencing a NULL pointer that reads memory address 0 to 4096 will result in a hardware trap being propagated to the application and crash it.
Reading past that first page, you might happen to poke into valid mapped memory, e.g. a shared library or something else that happened to be mapped there - and the memory access will not fail. And that's ok. Dereferencing a NULL pointer is undefined behavior, nothing requires it to fail.
Missing '{' after last if statement means that something in the "// ... code above skipped, not relevant ..." section is controlling access to that entire fragment of code. Out of all the code pasted only the strlcpy is executed. Solution: never use if statements without curly brackets to clarify control.
Consider this...
if(false)
{
if(something == stuff)
{
doStuff();
.. snip ..
if(monkey == blah)
some->garbage= nothing;
return -1;
}
}
crash();
Only "crash();" gets executed.
I would run your program under valgrind. You already know there's a problem with NULL pointers, so profile that code.
The advantage that valgrind beings here is that it checks every single pointer reference and checks to see if that memory location has been previously declared, and it will tell you the line number, structure, and anything else you care to know about memory.
As every one else mentioned, referencing the 0 memory location is a "que sera, sera" kinda thing.
My C tinged spidey sense is telling me that you should break out those structure walks on the
if (information->kind.name->data[information->kind.name->length] != '\0') {
line like
if (information == NULL) {
return -1;
}
if (information->kind == NULL) {
return -1;
}
and so on.
Wow, thats strange. One thing does look slightly suspicious to me, though it may not contribute:
What would happen if information and data were good pointers (non null), but information.kind.name was null. You don't dereference this pointer until the strlcpy line, so if it was null, it might not crash until then. Of course, earlier than t hat you do dereference data[1] to set it to \0, which should also crash, but due to whatever fluke, your program may just happen to have write access to 0x01 but not 0x00.
Also, I see you use information->name.length in one place but information->kind.name.length in another, not sure if thats a typo or if thats desired.
Despite the fact that dereferencing a null pointer leads to undefined behaviour and not necessarily to a crash, you should check the value of information->kind.name->data and not the contents of information->kind.name->data[1].
char * p = NULL;
p[i] is like
p += i;
which is a valid operation, even on a nullpointer. it then points at memory location 0x0000[...]i
You should always check whether information->kind.name->data is null anyway, but in this case
in
if (*result == NULL)
freeParsedData(information);
return -1;
}
you have missed a {
it should be
if (*result == NULL)
{
freeParsedData(information);
return -1;
}
This is a good reason for this coding style, instead of
if (*result == NULL) {
freeParsedData(information);
return -1;
}
where you might not spot the missing brace because you are used to the shape of the code block without the brace separating it from the if clause.
*result = malloc(realLength); // ???
Address of newly allocated memory segment is stored at the location referenced by the address contained in the variable "result".
Is this the intent? If so, the strlcpy may need modification.
As per my understanding, the special case of this problem is invalid access resulting with an attempt to read or write, using a Null pointer. Here the detection of the problem is very much hardware dependent. On some platforms, accessing memory for read or write using in NULL pointer will result in an exception.

Resources