C: dynamic char-array crashes heap - c

I have yet again a question about the workings of C. (ANSI-C compiled by VS2012)
I am refactoring a standalone program (.exe) into a .dll. This works fine so far but I stumble accross problems when it comes to logging. Let me explain:
The original program - while running - wrote a log-file and printed information to the screen. Since my dll is going to run on a webserver, accessed by many people simultaneously there is
no real chance to handle log-files properly (and clean up after them)
no console-window anyone would see
So my goal is to write everything that would be put in the log-file or on the screen into string-like variables (I know that there are no strings in C) which I then can later pass on requet to the caller (also a dll, but written in C#).
Since in C such a thing is not possible:
char z88rlog;
z88rlog="First log-entry\n";
z88rlog+="Second log-entry\n";
I have two possibilities:
char z88rlog[REALLY_HUGE];
dynamically allocating memory
In my mind the first way is to be ignored because:
The potential waste of memory is rather enormous
I still may need more memory than REALLY_HUGE, thus creating a buffer overflow
which leaves me with the second way. I have done some work on that and came up with two solutions, either of which doesn't work properly.
/* Solution 1 */
void logpr(char* tmpstr)
{
extern char *z88rlog;
if (z88rlog==NULL)
{
z88rlog=malloc(strlen(tmpstr)+1);
strcpy(z88rlog,tmpstr);
}
else
{
z88rlog=realloc(z88rlog,strlen(z88rlog)+strlen(tmpstr));
z88rlog=strcat(z88rlog,tmpstr);
}
}
In solution 1 (equal to solution 2 you will find) I pass my new log-entry through char tmpstr[255];. My "log-file" z88rlog is declared globally, so I need extern to access it. I then check if memory has been allocated for z88rlog. If no I allocate memory the size of my log-entry (+1 for my \0) and copy the contents of tmpstr into z88rlog. If yes I realloc memory for z88rlog in the size of what it has been + the length of tmpstr (+1). Then the two "string" are joined, using strcat. Using breakpoints an the direct-window I obtainded the following output:
z88rlog
0x00000000 <Schlechtes Ptr>
z88rlog
0x0059ef80 "start Z88R version 14OS"
z88rlog
0x0059ef80 "start Z88R version 14OS
opening file Z88.DYNÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍýýýý««««««««þîþîþîþ"
It shows three consecutive calls of logpr (breakpoint before strcpy/strcat). The indistinguable gibberish at the end results from memory allocation. After that VS gives out an error message that something caused the debugger to set a breakpoint in realloc.c. Because this obviously doesn't work I concocted my wonderful solution 2:
/* Solution 2 */
void logpr(char* tmpstr)
{
extern char *z88rlog;
char *z88rlogtmp;
if (z88rlog==NULL)
{
z88rlog=malloc(strlen(tmpstr)+1);
strcpy(z88rlog,tmpstr);
}
else
{
z88rlogtmp=malloc(strlen(z88rlog)+strlen(tmpstr+1));
z88rlogtmp=strcat(z88rlog,tmpstr);
free(z88rlog);
z88rlog=malloc(strlen(z88rlogtmp)+1);
memcpy(z88rlog,z88rlogtmp,strlen(z88rlogtmp)+1);
free(z88rlogtmp);
}
}
Here my aim is to create a copy of my log-file, free the originals' memory create new memory for the original in the new size and copy the contents back. And don't forget to free the temporary copy since it's allocated via malloc. This crashes instantly when it reaches free, again telling me that the heap might be broken.
So lets comment free for the time being. This does work better - much to my relief - but while building the log-string suddenly not all characters from z88rlogtmp get copied. But everything still works kind of properly. Until suddenly I am told again that the heap might be broken and the debugger puts a breakpoint at the end of _heap_alloc (size_t size) in malloc.c size has - according to the debugger - the value of 1041.
So I have 2 (or 3) ways I want to achieve this "string-growing" but none works. Might the error giving me the size point me to the conclusion that the array has become to big? I hope I explained well what I want to do and someone can help me :-) Thanks in advance!
irony on Maybee I should just go and buy some new heap for the computer. Does it fit in RAM-slots? Can anyone recomend a good brand? irony off

This is one mistake in Solution 1:
z88rlog=realloc(z88rlog,strlen(z88rlog)+strlen(tmpstr));
as no space is allocated for the terminating null character. Note that you must store the result of realloc() to a temporary variable to avoid memory leak in the event of failure. To correct:
char* tmp = realloc(z88rlog, strlen(z88rlog) + strlen(tmpstr) + 1);
if (tmp)
{
z88rlog = tmp;
/* ... */
}
Mistakes in Solution 2:
z88rlogtmp=malloc(strlen(z88rlog)+strlen(tmpstr+1));
/*^^^^^^^^^*/
it is calulating one less than the length of tmpstr. To correct:
z88rlogtmp=malloc(strlen(z88rlog) + strlen(tmpstr) + 1);
Pointer reassignment resulting in undefined behaviour:
z88rlogtmp=strcat(z88rlog,tmpstr);
/* Now, 'z88rlogtmp' and 'z88rlog' point to the same memory. */
free(z88rlog);
/* 'z88rlogtmp' now points to deallocated memory. */
z88rlog=malloc(strlen(z88rlogtmp)+1);
/* This call ^^^^^^^^^^^^^^^^^^ is undefined behaviour,
and from this point on anything can happen. */
memcpy(z88rlog,z88rlogtmp,strlen(z88rlogtmp)+1);
free(z88rlogtmp);
Additionally, if the code is executing within a Web Server it is almost certainly operating in a multi-threaded environment. As you have a global variable it will need synchronized access.

You seem to have many problems. To start with in your realloc call you don't allocate space for the terminating '\0' character. In your second solution you have strlen(tmpstr+1) which isn't correct. In your second solution you also use strcat to append to the existing buffer z88rlog, and if it's not big enough you overwrite unallocated memory, or over data allocated for something else. The first argument to strcat is the destination, and that is what is returned by the function as well so you loose the newly allocated memory too.
The first solution, with realloc, should work fine, if you just remember to allocate that extra character.

In solution 1, you would need to allocate space for terminating NULL character. Hence, the realloc should include one more space i.e.
z88rlog=realloc(z88rlog,strlen(z88rlog)+strlen(tmpstr) + 1);
In second solution, I am not sure of this z88rlogtmp=strcat(z88rlog,tmpstr); because z88rlog is the destination string. In case you wish to perform malloc only, then
z88rlogtmp=malloc(strlen(z88rlog)+1 // Allocate a temporary string
strcpy(z88rlogtmp,z88rlog); // Make a copy
free(z88rlog); // Free current string
z88rlog=malloc(strlen(z88rlogtmp)+ strlen(tmpstr) + 1)); //Re-allocate memory
strcpy(z88rlog, z88rlogtmp); // Copy first string
strcat(z88rlog, tmpStr); // Concatenate the next string
free(z88rlogtmp); // Free the Temporary string

Related

Weird situation when returning char *

I am pretty new to C programming and I have several functions returning type char *
Say I declare char a[some_int];, and I fill it later on. When I attempt to return it at the end of the function, it will only return the char at the first index. One thing I noticed, however, is that it will return the entirety of a if I call any sort of function on it prior to returning it. For example, my function to check the size of a string (calling something along the lines of strLength(a);).
I'm very curious what the situation is with this exactly. Again, I'm new to C programming (as you probably can tell).
EDIT: Additionally, if you have any advice concerning the best method of returning this, please let me know. Thanks!
EDIT 2: For example:
I have char ret[my_strlen(a) + my_strlen(b)]; in which a and b are strings and my_strlen returns their length.
Then I loop through filling ret using ret[i] = a[i]; and incrementing.
When I call my function that prints an input string (as a test), it prints out how I want it, but when I do
return ret;
or even
char *ptr = ret;
return ptr;
it never supplies me with the full string, just the first char.
A way not working to return a chunk of char data is to return it in memory temporaryly allocated on the stack during the execution of your function and (most probably) already used for another purpose after it returned.
A working alternative would be to allocate the chunk of memory ont the heap. Make sure you read up about and understand the difference between stack and heap memory! The malloc() family of functions is your friend if you choose to return your data in a chunk of memory allocated on the heap (see man malloc).
char* a = (char*) malloc(some_int * sizeof(char)) should help in your case. Make sure you don't forget to free up memory once you don't need it any more.
char* ret = (char*) malloc((my_strlen(a) + my_strlen(b)) * sizeof(char)) for the second example given. Again don't forget to free once the memory isn't used any more.
As MByD correctly pointed out, it is not forbidden in general to use memory allocated on the stack to pass chunks of data in and out of functions. As long as the chunk is not allocated on the stack of the function returning this is also quite well.
In the scenario below function b will work on a chunk of memory allocated on the stackframe created, when function a entered and living until a returns. So everything will be pretty fine even though no memory allocated on the heap is involved.
void b(char input[]){
/* do something useful here */
}
void a(){
char buf[BUFFER_SIZE];
b(buf)
/* use data filled in by b here */
}
As still another option you may choose to leave memory allocation on the heap to the compiler, using a global variable. I'd count at least this option to the last resort category, as not handled properly, global variables are the main culprits in raising problems with reentrancy and multithreaded applications.
Happy hacking and good luck on your learning C mission.

What happens to memory after '\0' in a C string?

Surprisingly simple/stupid/basic question, but I have no idea: Suppose I want to return the user of my function a C-string, whose length I do not know at the beginning of the function. I can place only an upper bound on the length at the outset, and, depending on processing, the size may shrink.
The question is, is there anything wrong with allocating enough heap space (the upper bound) and then terminating the string well short of that during processing? i.e. If I stick a '\0' into the middle of the allocated memory, does (a.) free() still work properly, and (b.) does the space after the '\0' become inconsequential? Once '\0' is added, does the memory just get returned, or is it sitting there hogging space until free() is called? Is it generally bad programming style to leave this hanging space there, in order to save some upfront programming time computing the necessary space before calling malloc?
To give this some context, let's say I want to remove consecutive duplicates, like this:
input "Hello oOOOo !!" --> output "Helo oOo !"
... and some code below showing how I'm pre-computing the size resulting from my operation, effectively performing processing twice to get the heap size right.
char* RemoveChains(const char* str)
{
if (str == NULL) {
return NULL;
}
if (strlen(str) == 0) {
char* outstr = (char*)malloc(1);
*outstr = '\0';
return outstr;
}
const char* original = str; // for reuse
char prev = *str++; // [prev][str][str+1]...
unsigned int outlen = 1; // first char auto-counted
// Determine length necessary by mimicking processing
while (*str) {
if (*str != prev) { // new char encountered
++outlen;
prev = *str; // restart chain
}
++str; // step pointer along input
}
// Declare new string to be perfect size
char* outstr = (char*)malloc(outlen + 1);
outstr[outlen] = '\0';
outstr[0] = original[0];
outlen = 1;
// Construct output
prev = *original++;
while (*original) {
if (*original != prev) {
outstr[outlen++] = *original;
prev = *original;
}
++original;
}
return outstr;
}
If I stick a '\0' into the middle of the allocated memory, does
(a.) free() still work properly, and
Yes.
(b.) does the space after the '\0' become inconsequential? Once '\0' is added, does the memory just get returned, or is it sitting there hogging space until free() is called?
Depends. Often, when you allocate large amounts of heap space, the system first allocates virtual address space - as you write to the pages some actual physical memory is assigned to back it (and that may later get swapped out to disk when your OS has virtual memory support). Famously, this distinction between wasteful allocation of virtual address space and actual physical/swap memory allows sparse arrays to be reasonably memory efficient on such OSs.
Now, the granularity of this virtual addressing and paging is in memory page sizes - that might be 4k, 8k, 16k...? Most OSs have a function you can call to find out the page size. So, if you're doing a lot of small allocations then rounding up to page sizes is wasteful, and if you have a limited address space relative to the amount of memory you really need to use then depending on virtual addressing in the way described above won't scale (for example, 4GB RAM with 32-bit addressing). On the other hand, if you have a 64-bit process running with say 32GB of RAM, and are doing relatively few such string allocations, you have an enormous amount of virtual address space to play with and the rounding up to page size won't amount to much.
But - note the difference between writing throughout the buffer then terminating it at some earlier point (in which case the once-written-to memory will have backing memory and could end up in swap) versus having a big buffer in which you only ever write to the first bit then terminate (in which case backing memory is only allocated for the used space rounded up to page size).
It's also worth pointing out that on many operating systems heap memory may not be returned to the Operating System until the process terminates: instead, the malloc/free library notifies the OS when it needs to grow the heap (e.g. using sbrk() on UNIX or VirtualAlloc() on Windows). In that sense, free() memory is free for your process to re-use, but not free for other processes to use. Some Operating Systems do optimise this - for example, using a distinct and independently releasble memory region for very large allocations.
Is it generally bad programming style to leave this hanging space there, in order to save some upfront programming time computing the necessary space before calling malloc?
Again, it depends on how many such allocations you're dealing with. If there are a great many relative to your virtual address space / RAM - you want to explicitly let the memory library know not all the originally requested memory is actually needed using realloc(), or you could even use strdup() to allocate a new block more tightly based on actual needs (then free() the original) - depending on your malloc/free library implementation that might work out better or worse, but very few applications would be significantly affected by any difference.
Sometimes your code may be in a library where you can't guess how many string instances the calling application will be managing - in such cases it's better to provide slower behaviour that never gets too bad... so lean towards shrinking the memory blocks to fit the string data (a set number of additional operations so doesn't affect big-O efficiency) rather than having an unknown proportion of the original string buffer wasted (in a pathological case - zero or one character used after arbitrarily large allocations). As a performance optimisation you might only bother returning memory if unusued space is >= the used space - tune to taste, or make it caller-configurable.
You comment on another answer:
So it comes down to judging whether the realloc will take longer, or the preprocessing size determination?
If performance is your top priority, then yes - you'd want to profile. If you're not CPU bound, then as a general rule take the "preprocessing" hit and do a right-sized allocation - there's just less fragmentation and mess. Countering that, if you have to write a special preprocessing mode for some function - that's an extra "surface" for errors and code to maintain. (This trade-off decision is commonly needed when implementing your own asprintf() from snprintf(), but there at least you can trust snprintf() to act as documented and don't personally have to maintain it).
Once '\0' is added, does the memory just get returned, or is it
sitting there hogging space until free() is called?
There's nothing magical about \0. You have to call realloc if you want to "shrink" the allocated memory. Otherwise the memory will just sit there until you call free.
If I stick a '\0' into the middle of the allocated memory, does (a.)
free() still work properly
Whatever you do in that memory free will always work properly if you pass it the exact same pointer returned by malloc. Of course if you write outside it all bets are off.
\0 is just one more character from malloc and free perspective, they don't care what data you put in the memory. So free will still work whether you add \0 in the middle or don't add \0 at all. The extra space allocated will still be there, it won't be returned back to the process as soon as you add \0 to the memory. I personally would prefer to allocate only the required amount of memory instead of allocating at some upper bound as that will just wasting the resource.
As soon as you get memory from heap by calling malloc(), the memory is yours to use. Inserting \0 is like inserting any other character. This memory will remain in your possession until you free it or until OS claims it back.
The \0is a pure convention to interpret character arrays as stings - it is independent of the memory management. I.e., if you want to get your money back, you should call realloc. The string does not care about memory (what is a source of many security problems).
malloc just allocates a chunk of memory .. Its upto you to use however you want and call free from the initial pointer position... Inserting '\0' in the middle has no consequence...
To be specific malloc doesnt know what type of memory you want (It returns onle a void pointer) ..
Let us assume you wish to allocate 10 bytes of memory starting 0x10 to 0x19 ..
char * ptr = (char *)malloc(sizeof(char) * 10);
Inserting a null at 5th position (0x14) does not free the memory 0x15 onwards...
However a free from 0x10 frees the entire chunk of 10 bytes..
free() will still work with a NUL byte in memory
the space will remain wasted until free() is called, or unless you subsequently shrink the allocation
Generally, memory is memory is memory. It doesn't care what you write into it. BUT it has a race, or if you prefer a flavor (malloc, new, VirtualAlloc, HeapAlloc, etc). This means that the party that allocates a piece of memory must also provide the means to deallocate it. If your API comes in a DLL, then it should provide a free function of some sort.
This of course puts a burden on the caller right?
So why not put the WHOLE burden on the caller?
The BEST way to deal with dynamically allocated memory is to NOT allocate it yourself. Have the caller allocate it and pass it on to you. He knows what flavor he allocated, and he is responsible to free it whenever he is done using it.
How does the caller know how much to allocate?
Like many Windows APIs have your function return the required amount of bytes when called e.g. with a NULL pointer, then do the job when provided with a non-NULL pointer (using IsBadWritePtr if it is suitable for your case to double-check accessibility).
This can also be much much more efficient. Memory allocations COST a lot. Too many memory allocations cause heap fragmentation and then the allocations cost even more. That's why in kernel mode we use the so called "look-aside lists". To minimize the number of memory allocations done, we reuse the blocks we have already allocated and "freed", using services that the NT Kernel provides to driver writers.
If you pass on the responsibility for memory allocation to your caller, then he might be passing you cheap memory from the stack (_alloca), or passing you the same memory over and over again without any additional allocations. You don't care of course, but you DO allow your caller to be in charge of optimal memory handling.
To elaborate on the use of the NULL terminator in C:
You cannot allocate a "C string" you can allocate a char array and store a string in it, but malloc and free just see it as an array of the requested length.
A C string is not a data type but a convention for using a char array where the null character '\0' is treated as the string terminator.
This is a way to pass strings around without having to pass a length value as a separate argument. Some other programming languages have explicit string types that store a length along with the character data to allow passing strings in a single parameter.
Functions that document their arguments as "C strings" are passed char arrays but have no way of knowing how big the array is without the null terminator so if it is not there things will go horribly wrong.
You will notice functions that expect char arrays that are not necessarily treated as strings will always require a buffer length parameter to be passed.
For example if you want to process char data where a zero byte is a valid value you can't use '\0' as a terminator character.
You could do what some of the MS Windows APIs do where you (the caller) pass a pointer and the size of the memory you allocated. If the size isn't enough, you're told how many bytes to allocate. If it was enough, the memory is used and the result is the number of bytes used.
Thus the decision about how to efficiently use memory is left to the caller. They can allocate a fixed 255 bytes (common when working with paths in Windows) and use the result from the function call to know whether more bytes are needed (not the case with paths due to MAX_PATH being 255 without bypassing Win32 API) or whether most of the bytes can be ignored...
The caller could also pass zero as the memory size and be told exactly how much needs to be allocated - not as efficient processing-wise, but could be more efficient space-wise.
You can certainly preallocate to an upperbound, and use all or something less.
Just make sure you actually use all or something less.
Making two passes is also fine.
You asked the right questions about the tradeoffs.
How do you decide?
Use two passes, initially, because:
1. you'll know you aren't wasting memory.
2. you're going to profile to find out where
you need to optimize for speed anyway.
3. upperbounds are hard to get right before
you've written and tested and modified and
used and updated the code in response to new
requirements for a while.
4. simplest thing that could possibly work.
You might tighten up the code a little, too.
Shorter is usually better. And the more the
code takes advantage of known truths, the more
comfortable I am that it does what it says.
char* copyWithoutDuplicateChains(const char* str)
{
if (str == NULL) return NULL;
const char* s = str;
char prev = *s; // [prev][s+1]...
unsigned int outlen = 1; // first character counted
// Determine length necessary by mimicking processing
while (*s)
{ while (*++s == prev); // skip duplicates
++outlen; // new character encountered
prev = *s; // restart chain
}
// Construct output
char* outstr = (char*)malloc(outlen);
s = str;
*outstr++ = *s; // first character copied
while (*s)
{ while (*++s == prev); // skip duplicates
*outstr++ = *s; // copy new character
}
// done
return outstr;
}

malloc function crash

I have a problem with memory allocation using malloc.
Here is a fragment from my code:
printf("DEBUG %d\n",L);
char *s=(char*)malloc(L+2);
if(s==0)
{
printf("DEBUGO1");
}
printf("DEBUGO2\n");
It outputs "DEBUG 3",and then a error msgbox appears with this message:
The instruction at 0x7c9369aa referenced memory at "0x0000000". The
memory could not be read
For me such behavior is very strange.
What can be wrong here?
The application is single threaded.
I'm using mingw C compiler that is built in code::blocks 10.05
I can provide all the code if it is needed.
Thanks.
UPD1:
There is more code:
char *concat3(char *str1,char *str2,char *str3)
{
/*concatenate three strings and frees the memory allocated for substrings before*/
/* returns a pointer to the new string*/
int L=strlen(str1)+strlen(str2)+strlen(str3);
printf("DEBUG %d\n",L);
char *s=(char*)malloc(L+2);
if(s==0)
{
printf("DEBUGO1");
}
printf("DEBUGO2\n");
sprintf(s,"%s%s%s",str1,str2,str3);
free(str1);
free(str2);
free(str3);
return s;
}
UPD2:
It seems the problem is more complicated than i thought. Just if somebody has enough time for helping me out:
Here is all the code
Proj
(it is code::blocks 10.05 project,but you may compile the sources without an ide ,it is pure C without any libraries):
call the program as
"cbproj.exe s.pl" (the s.pl file is in the root of the arhive)
and you may see it crashes when it calls the function "malloc" that is on the 113th line of "parser.tab.c"(where the function concat3 is written).
I do the project in educational purpouses,you may use the source code without any restrictions.
UPD3:
The problem was that it was allocated not enough memory for one of the strings in program ,but the it seemed to work until the next malloc.. Oh,I hate C now:)
I agree with the comments about bad coding style,need to improve myself in this.
The problem with this exact code is that when malloc fails, you don't return from the function but use this NULL-pointer further in sprintf call as a buffer.
I'd also suggest you to free memory allocated for str1, str2 and str3 outside this function, or else you might put yourself into trouble somewhere else.
EDIT: after running your program under valgrind, two real problems revealed (in parser.tab.c):
In yyuserAction,
char *applR=(char*)malloc(strlen(ruleName)+7);
sprintf(applR,"appl(%s).",ruleName);
+7 is insufficient since you also need space for \0 char at the end of string. Making it +8 helped.
In SplitList,
char *curstr=(char*)malloc(leng);
there's a possibility of allocating zero bytes. leng + 1 helps.
After aforementioned changes, everything runs fine (if one could say so, since I'm not going to count memory leaks).
From the error message it actually looks like your if statement is not quite what you have posted here. It suggests that your if statement might be something like this:
if(s=0) {
}
Note the single = (assignment) instead of == (equality).
You cannot use free on pointers that were not created by malloc, calloc or realloc. From the Manpage:
free() frees the memory space pointed to by ptr, which must have been returned by a previous call to malloc(), calloc() or realloc(). Otherwise, or if free(ptr) has already been called before, undefined behavior occurs. If ptr is NULL, no operation is performed.

Why does this intentionally incorrect use of strcpy not fail horribly?

Why does the below C code using strcpy work just fine for me? I tried to make it fail in two ways:
1) I tried strcpy from a string literal into allocated memory that was too small to contain it. It copied the whole thing and didn't complain.
2) I tried strcpy from an array that was not NUL-terminated. The strcpy and the printf worked just fine. I had thought that strcpy copied chars until a NUL was found, but none was present and it still stopped.
Why don't these fail? Am I just getting "lucky" in some way, or am I misunderstanding how this function works? Is it specific to my platform (OS X Lion), or do most modern platforms work this way?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char *src1 = "123456789";
char *dst1 = (char *)malloc( 5 );
char src2[5] = {'h','e','l','l','o'};
char *dst2 = (char *)malloc( 6 );
printf("src1: %s\n", src1);
strcpy(dst1, src1);
printf("dst1: %s\n", dst1);
strcpy(dst2, src2);
printf("src2: %s\n", src2);
dst2[5] = '\0';
printf("dst2: %s\n", dst2);
return 0;
}
The output from running this code is:
$ ./a.out
src1: 123456789
dst1: 123456789
src2: hello
dst2: hello
First, copying into an array that is too small:
C has no protection for going past array bounds, so if there is nothing sensitive at dst1[5..9], then you get lucky, and the copy goes into memory that you don't rightfully own, but it doesn't crash either. However, that memory is not safe, because it has not been allocated to your variable. Another variable may well have that memory allocated to it, and later overwrite the data you put in there, corrupting your string later on.
Secondly, copying from an array that is not null-terminated:
Even though we're usually taught that memory is full of arbitrary data, huge chunks of it are zero'd out. Even though you didn't put a null-terminator in src2, chances are good that src[5] happens to be \0 anyway. This makes the copy succeed. Note that this is NOT guaranteed, and could fail on any run, on any platform, at anytime. But you got lucky this time (and probably most of the time), and it worked.
Overwriting beyond the bounds of allocated memory causes Undefined Behavior.
So in a way yes you got lucky.
Undefined behavior means anything can happen and the behavior cannot be explained as the Standard, which defines the rules of the language, does not define any behavior.
EDIT:
On Second thoughts, I would say you are really Unlucky here that the program works fine and does not crash. It works now does not mean it will work always, In fact it is a bomb ticking to blow off.
As per Murphy's Law:
"Anything that can go wrong will go wrong"["and most likely at the most inconvenient possible moment"]
[ ]- Is my edit to the Law :)
Yes, you're quite simply getting lucky.
Typically, the heap is contiguous. This means that when you write past the malloced memory, you could be corrupting the following memory block, or some internal data structures that may exist between user memory blocks. Such corruption often manifests itself long after the offending code, which makes debugging this type of bugs difficult.
You're probably getting the NULs because the memory happens to be zero-filled (which isn't guaranteed).
As #Als said, this is undefined behaviour. This may crash, but it doesn't have to.
Many memory managers allocate in larger chunks of memory and then hand it to the "user" in smaller chunks, probably a mutliple of 4 or 8 bytes. So your write over the boundary probably simply writes into the extra bytes allocated. Or it overwrites one of the other variables you have.
You're not malloc-ing enough bytes there. The first string, "123456789" is 10 bytes (the null terminator is present), and {'h','e','l','l','o'} is 6 bytes (again, making room for the null terminator). You're currently clobbering the memory with that code, which leads to undefined (i.e. odd) behavior.

Strcpy() corrupts the copied string in Solaris but not Linux

I'm writing a C code for a class. This class requires that our code compile and run on the school server, which is a sparc solaris machine. I'm running Linux x64.
I have this line to parse (THIS IS NOT ACTUAL CODE BUT IS INPUT TO MY PROGRAM):
while ( cond1 ){
I need to capture the "while" and the "cond1" into separate strings. I've been using strtok() to do this. In Linux, the following lines:
char *cond = NULL;
cond = (char *)malloc(sizeof(char));
memset(cond, 0, sizeof(char));
strcpy(cond, strtok(NULL, ": \t\(){")); //already got the "while" out of the line
will correctly capture the string "cond1".Running this on the solaris machine, however, gives me the string "cone1".
Note that in plenty of other cases within my program, strings are being copied correctly. (For instance, the "while") was captured correctly.
Does anyone know what is going on here?
The line:
cond = (char *)malloc(sizeof(char));
allocates exactly one char for storage, into which you are then copying more than one - strcpy needs to put, at a bare minimum, the null terminator but, in your case, also the results of your strtok as well.
The reason it may work on a different system is that some implementations of malloc will allocate at a certain resolution (e.g., a multiple of 16 bytes) no matter what actual value you ask for, so you may have some free space there at the end of your buffer. But what you're attempting is still very much undefined behaviour.
The fact that the undefined behaviour may be to work sometimes in no way abrogates your responsibility to avoid such behaviour.
Allocate enough space for storing the results of your strtok and you should be okay.
The safest way to do this is to dynamically allocate the space so that it's at least as big as the string you're passing to strtok. That way there can be no possibility of overflow (other than weird edge cases where other threads may modify the data behind your back but, if that were the case, strtok would be a very bad choice anyway).
Something like (if instr is your original input string):
cond = (char*)malloc(strlen(instr)+1);
This guarantees that any token extracted from instr will fit within cond.
As an aside, sizeof(char) is always 1 by definition, so you don't need to multiply by it.
cond is being allocated one byte. strcpy is copying at least two bytes to that allocation. That is, you are writing more bytes into the allocation than there is room for.
One way to fix it to use char *cond = malloc (1000); instead of what you've got.
You only allocated memory for 1 character but you trying to store at least 6 characters (you need space for the terminating \0). The quick and dirty way to solve this is just say
char cond[128]
instead of malloc.

Resources