So it is my understanding that a C string of, for instance "0123456789", would actually occupy an array of 11 chars, 10 chars for the body and one for the terminating null. If that is true then why does the code below NOT cause some sort of error?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char ** argv){
char * my_string = "0123456789";
/* my string should occupy 11 bytes */
int my_len = strlen(my_string);
/* however, strlen should only return 10,
because it does not count the null byte */
char * new_string = malloc(my_len);
/* allocate memory 10 bytes wide */
memcpy(new_string, my_string, my_len);
/* copy the first 10 bytes from my_string to new_string
new_string should NOT be null terminated if my understanding
is correct? */
printf("%s\n", new_string);
/* Since new_stirng is NOT null terminated it seems like this should
cause some sort of memory exception.
WHY DOES THIS NOT CAUSE AN ERROR?
*/
return 0;
}
Since new_string is not null terminated I would expect printf to just read forever until it reaches some other applications memory, or a randomly placed 0x00 somewhere and either crash or print something strange. What's going on?
You have created undefined behavior. The behavior is compiler and platform dependent. It could crash. It could work. It could make you toast. It could collapse your computer into a singularity and absorb the solar system.
In your case, it's likely that the memory at new_string[11] was already 0, which is '\0', or the terminating-null character.
Because it's undefined behavior. Undefined behavior does not mean a seg fault, although that's one possibility.
In your case, the particular memory block you got from malloc was never used before (by your program), and it was probably zero-initialized by the OS when it was allocated. Therefore, the 11th byte was likely a zero, and so no error. In a longer-running program, malloc can return a dirty chunk of memory where the 11th byte is not 0, and so you will run into problems at the printf statement. In practice, you'll either print garbage (until the first null is encountered) or segfault if there is no null before the end of the allocated region.
(As others have said, the behavior is formally undefined, I'm just explaining what you saw).
or a randomly placed 0x00 somewhere ... print something strange.
This is commonly observed behavior. What will actually happen is undefined.
This is undefined behaviour. In some cases it could cause a crash it would depend on the memory layout I would imagine
Related
So I read that strcat function is to be used carefully as the destination string should be large enough to hold contents of its own and source string. And it was true for the following program that I wrote:
#include <stdio.h>
#include <string.h>
int main(){
char *src, *dest;
printf("Enter Source String : ");
fgets(src, 10, stdin);
printf("Enter destination String : ");
fgets(dest, 20, stdin);
strcat(dest, src);
printf("Concatenated string is %s", dest);
return 0;
}
But not true for the one that I wrote here:
#include <stdio.h>
#include <string.h>
int main(){
char src[11] = "Hello ABC";
char dest[15] = "Hello DEFGIJK";
strcat(dest, src);
printf("concatenated string %s", dest);
getchar();
return 0;
}
This program ends up adding both without considering that destination string is not large enough. Why is it so?
The strcat function has no way of knowing exactly how long the destination buffer is, so it assumes that the buffer passed to it is large enough. If it's not, you invoke undefined behavior by writing past the end of the buffer. That's what's happening in the second piece of code.
The first piece of code is also invalid because both src and dest are uninitialized pointers. When you pass them to fgets, it reads whatever garbage value they contain, treats it as a valid address, then tries to write values to that invalid address. This is also undefined behavior.
One of the things that makes C fast is that it doesn't check to make sure you follow the rules. It just tells you the rules and assumes that you follow them, and if you don't bad things may or may not happen. In your particular case it appeared to work but there's no guarantee of that.
For example, when I ran your second piece of code it also appeared to work. But if I changed it to this:
#include <stdio.h>
#include <string.h>
int main(){
char dest[15] = "Hello DEFGIJK";
strcat(dest, "Hello ABC XXXXXXXXXX");
printf("concatenated string %s", dest);
return 0;
}
The program crashes.
I think your confusion is not actually about the definition of strcat. Your real confusion is that you assumed that the C compiler would enforce all the "rules". That assumption is quite false.
Yes, the first argument to strcat must be a pointer to memory sufficient to store the concatenated result. In both of your programs, that requirement is violated. You may be getting the impression, from the lack of error messages in either program, that perhaps the rule isn't what you thought it was, that somehow it's valid to call strcat even when the first argument is not a pointer to enough memory. But no, that's not the case: calling strcat when there's not enough memory is definitely wrong. The fact that there were no error messages, or that one or both programs appeared to "work", proves nothing.
Here's an analogy. (You may even have had this experience when you were a child.) Suppose your mother tells you not to run across the street, because you might get hit by a car. Suppose you run across the street anyway, and do not get hit by a car. Do you conclude that your mother's advice was incorrect? Is this a valid conclusion?
In summary, what you read was correct: strcat must be used carefully. But let's rephrase that: you must be careful when calling strcat. If you're not careful, all sorts of things can go wrong, without any warning. In fact, many style guides recommend not using functions such as strcat at all, because they're so easy to misuse if you're careless. (Functions such as strcat can be used perfectly safely as long as you're careful -- but of course not all programmers are sufficiently careful.)
The strcat() function is indeed to be used carefully because it doesn't protect you from anything. If the source string isn't NULL-terminated, the destination string isn't NULL-terminated, or the destination string doesn't have enough space, strcat will still copy data. Therefore, it is easy to overwrite data you didn't mean to overwrite. It is your responsibility to make sure you have enough space. Using strncat() instead of strcat will also give you some extra safety.
Edit Here's an example:
#include <stdio.h>
#include <string.h>
int main()
{
char s1[16] = {0};
char s2[16] = {0};
strcpy(s2, "0123456789abcdefOOPS WAY TOO LONG");
/* ^^^ purposefully copy too much data into s2 */
printf("-%s-\n",s1);
return 0;
}
I never assigned to s1, so the output should ideally be --. However, because of how the compiler happened to arrange s1 and s2 in memory, the output I actually got was -OOPS WAY TOO LONG-. The strcpy(s2,...) overwrote the contents of s1 as well.
On gcc, -Wall or -Wstringop-overflow will help you detect situations like this one, where the compiler knows the size of the source string. However, in general, the compiler can't know how big your data will be. Therefore, you have to write code that makes sure you don't copy more than you have room for.
Both snippets invoke undefined behavior - the first because src and dest are not initialized to point anywhere meaningful, and the second because you are writing past the end of the array.
C does not enforce any kind of bounds checking on array accesses - you won't get an "Index out of range" exception if you try to write past the end of an array. You may get a runtime error if you try to access past a page boundary or clobber something important like the frame pointer, but otherwise you just risk corrupting data in your program.
Yes, you are responsible for making sure the target buffer is large enough for the final string. Otherwise the results are unpredictable.
I'd like to point out what is actually happening in the 2nd program in order to illustrate the problem.
It allocates 15 bytes at the memory location starting at dest and copies 14 bytes into it (including the null terminator):
char dest[15] = "Hello DEFGIJK";
...and 11 bytes at src with 10 bytes copied into it:
char src[11] = "Hello ABC";
The strcat() call then copies 10 bytes (9 chars plus the null terminator) from src into dest, starting right after the 'K' in dest. The resulting string at dest will be 23 bytes long including the null terminator. The problem is, you allocated only 15 bytes at dest, and the memory adjacent to that memory will be overwritten, i.e. corrupted, leading to program instability, wrong results, data corruption, etc.
Note that the strcat() function knows nothing about the amount of memory you've allocated at dest (or src, for that matter). It is up to you to make sure you've allocated enough memory at dest to prevent memory corruption.
By the way, the first program doesn't allocate memory at dest or src at all, so your calls to fgets() are corrupting memory starting at those locations.
program in c language
void main()
{
char *a,*b;
a[0]='s';
a[1]='a';
a[2]='n';
a[3]='j';
a[4]='i';
a[5]='t';
printf("length of a %d/n", strlen(a));
b[0]='s';
b[1]='a';
b[2]='n';
b[3]='j';
b[4]='i';
b[5]='t';
b[6]='g';
printf("length of b %d\n", strlen(b));
}
here the output is :
length of a 6
length of b 12
Why and please explain it.
thanks in advance.
You are assigning to pointer (which contains garbage) without allocating memory. What you are noticing is Undefined Behavior. Also main should return an int. Also it does not make sense to try and find the length of an array of chars which are not nul terminated.
This is how you can go about:
Sample code
When you declare any variable it comes with whatever it had in memory previously where your application is running, and since pointers are essentially numbers, whatever number it had referenced to some random memory address.
Then, when setting a[i], the compiler interprets that as you want to step sizeof(a) bytes forward, thus, a[i] is equal to the address (a + i*1) (1 because chars use one byte).
Finally, C-strings need to be NULL terminated (\0, also known as sentinel), and methods like strlen go over the length of the string until hitting the sentinel, most likely, your memory had a stray 0 somewhere that caused strlen to stop.
Allocate some memory and terminate the strings then it will work better
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void main(){
char *a=malloc(10);
char *b=malloc(10);
if(a){
a[0]='s';
a[1]='a';
a[2]='n';
a[3]='j';
a[4]='i';
a[5]='t';
a[6]=(char)0;
printf("length of a %d\n", (int)strlen(a));
}else{
printf("Failed to allocate 10 bytes\n" );
}
if(b){
b[0]='s';
b[1]='a';
b[2]='n';
b[3]='j';
b[4]='i';
b[5]='t';
b[6]='g';
b[7]=(char)0;
printf("length of b %d\n", (int)strlen(b));
}else{
printf("Failed to allocate 10 bytes\n" );
}
free(a);
free(b);
}
Undefined behavior. That's all.
You're using an uninitialized pointer. After that, all bets are off as to what will happen.
Of course, we can attempt to explain why your particular implementation acts in a certain way but it'd be quite pointless outside of novelty.
The indexing operator is de-referencing the pointers a and b, but you never initialized those pointers to point at valid memory. Writing to un-initialized memory triggers undefined behavior.
You are simply "lucky" (or unlucky, it depends on your viewpoint) that the program doesn't crash, that the pointer values are such that you succeed in writing at those locations.
Note that you never write the termination character ('\0') to either string, but still get the "right" value from strlen(); this implies that a and b both point at memory that happens to be full of zeros. More luck.
This is a very broken program; that it manages to run "successfully" is because it's behavior is undefined, and undefined clearly includes "working as the programmer intended".
a and b are both char pointers. First of all, you didn't initialise them and secondly didn't terminate them with NULL.
I found this code working perfectly.
#include <stdio.h>
#include <stdlib.h>
int main(int argc,char *argv[])
{
char* s; /* input string */
s=malloc(sizeof(s));
int c;
if(argc==1){ // if file name not given
while (gets(s)){
puts(s);
}
}
}
What I don't understand is, how is the string s stored in memory.i am allocating memory only for the pointer s, which is of 4 bytes.Now where does the input string given by the user get stored in?
It's only safe for the first four bytes. The fifth byte will overrun the allocated data and tramp on something else which will yield undefined behaviour (might crash, might not).
Also, you don't null terminate the string with '\0' after you finish writing the chars, so you'll probably introduce another crash when you try and call a string routine (strcpy) on it - unless the memory after your string happened to contain zeros anyway, but naturally you shouldn't rely on this chance!
Rather than this you should do
s=malloc(sizeof(*s)*(number_of_chars+1));
You set number_of_chars to appropriate value, so that you allocate memory to store those many characters. +1 is for last '\0' character.
With your approach you are allocating 4 bytes so you can store usually those many characters.
You've allocated sizeof(void*) bytes of memory and filling it with user-provided data. You have an address and writing to it, it's ok from compiler's point of view (maybe it's really what you want, who knows). Even if you program didn't crash when you exceed it - it's still an error. It's just a memory, something else could be stored in this area, and you'll overwrite it - so expect heavy trouble if you'll ever do that.
It's possible as compiler assigns two bytes.
now you give 10 bytes in input, so your allocated memory overflows and data stored beyond your allocated memory only if its available.
It might give error if the data you want to store is greater then available and not give error if the data you want to store is greater then allocated.
puts will print data until it gets '\0'.
So this is expected behavior!!
Why does the below C code using strcpy work just fine for me? I tried to make it fail in two ways:
1) I tried strcpy from a string literal into allocated memory that was too small to contain it. It copied the whole thing and didn't complain.
2) I tried strcpy from an array that was not NUL-terminated. The strcpy and the printf worked just fine. I had thought that strcpy copied chars until a NUL was found, but none was present and it still stopped.
Why don't these fail? Am I just getting "lucky" in some way, or am I misunderstanding how this function works? Is it specific to my platform (OS X Lion), or do most modern platforms work this way?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char *src1 = "123456789";
char *dst1 = (char *)malloc( 5 );
char src2[5] = {'h','e','l','l','o'};
char *dst2 = (char *)malloc( 6 );
printf("src1: %s\n", src1);
strcpy(dst1, src1);
printf("dst1: %s\n", dst1);
strcpy(dst2, src2);
printf("src2: %s\n", src2);
dst2[5] = '\0';
printf("dst2: %s\n", dst2);
return 0;
}
The output from running this code is:
$ ./a.out
src1: 123456789
dst1: 123456789
src2: hello
dst2: hello
First, copying into an array that is too small:
C has no protection for going past array bounds, so if there is nothing sensitive at dst1[5..9], then you get lucky, and the copy goes into memory that you don't rightfully own, but it doesn't crash either. However, that memory is not safe, because it has not been allocated to your variable. Another variable may well have that memory allocated to it, and later overwrite the data you put in there, corrupting your string later on.
Secondly, copying from an array that is not null-terminated:
Even though we're usually taught that memory is full of arbitrary data, huge chunks of it are zero'd out. Even though you didn't put a null-terminator in src2, chances are good that src[5] happens to be \0 anyway. This makes the copy succeed. Note that this is NOT guaranteed, and could fail on any run, on any platform, at anytime. But you got lucky this time (and probably most of the time), and it worked.
Overwriting beyond the bounds of allocated memory causes Undefined Behavior.
So in a way yes you got lucky.
Undefined behavior means anything can happen and the behavior cannot be explained as the Standard, which defines the rules of the language, does not define any behavior.
EDIT:
On Second thoughts, I would say you are really Unlucky here that the program works fine and does not crash. It works now does not mean it will work always, In fact it is a bomb ticking to blow off.
As per Murphy's Law:
"Anything that can go wrong will go wrong"["and most likely at the most inconvenient possible moment"]
[ ]- Is my edit to the Law :)
Yes, you're quite simply getting lucky.
Typically, the heap is contiguous. This means that when you write past the malloced memory, you could be corrupting the following memory block, or some internal data structures that may exist between user memory blocks. Such corruption often manifests itself long after the offending code, which makes debugging this type of bugs difficult.
You're probably getting the NULs because the memory happens to be zero-filled (which isn't guaranteed).
As #Als said, this is undefined behaviour. This may crash, but it doesn't have to.
Many memory managers allocate in larger chunks of memory and then hand it to the "user" in smaller chunks, probably a mutliple of 4 or 8 bytes. So your write over the boundary probably simply writes into the extra bytes allocated. Or it overwrites one of the other variables you have.
You're not malloc-ing enough bytes there. The first string, "123456789" is 10 bytes (the null terminator is present), and {'h','e','l','l','o'} is 6 bytes (again, making room for the null terminator). You're currently clobbering the memory with that code, which leads to undefined (i.e. odd) behavior.
I am learning how to use pointers, so i wrote the below program to assign integer values in the interval [1,100] to some random locations in the memory.
When i read those memory locations, printf displays all the values and then gives me a segmentation fault. This seems an odd behavior, because i was hoping to see either all the values OR a seg fault, but not both at the same time.
Can someone please explain why i got to see both?
Thanks. Here is the code
#include <stdio.h>
#include <stdlib.h>
int main()
{
char first = 'f';
char *ptr_first = &first;
int i=1;
for(i=1;i<101;i++)
*(ptr_first+i) = i;
for(i=1;i<101;i++)
printf("%d\n", *(ptr_first+i));
return EXIT_SUCCESS;
}
Not odd at all. You are using your variable first, which is on the stack. What you essentially do is happily overwriting the stack (otherwise known from buffer overflows on the stack) and thus probably destroying any return address and so on.
Since main is called by the libc, the return to libc would cause the crash.
You're accessing memory past beyond that assigned to first. It is just one character, and, through the ptr_first pointer, you're accessing 100 positions past this character to unreserved memory. This may lead to segfaults.
You have to ensure the original variable has enough memory reserved for the pointer accesses. For example:
char first[100];
This will convert first in an array of 100 chars (basically a memory space of 100 bytes that you can access via pointer).
Note also that you're inserting int into the char pointer. This will work, but the value of the int will be truncated. You should be using char as the type of i.
since ptr_first pointer is pointing to a char variable first. Now when you are incrementing ptr_first, so incremented memory address location can be out of process memory address space, thats why kernel is sending segmentation fault to this process.