I was just going through C library functions to see what I can do with them. When I came across the strcpy function the code I wrote resulted in a segmentation fault and I would like to know why. The code I wrote should be printing WorldWorld. If I understood correctly, strcpy(x,y) will copy the contents of y into x.
main() {
char *x = "Hello";
char *y = "World";
printf(strcpy(x,y));
}
If it worked, the code you wrote would print "World", not "WorldWorld". Nothing is appended, strcpy overwrites data only.
Your program crashes because "Hello" and "World" are string constants. It's undefined behavior to attempt to write to a constant, and in your case this manifests as a segmentation fault. You should use char x[] = "Hello"; and char y[] = "World"; instead, which reserve memory on the stack to hold the strings, where they can be overwritten.
There are more problems with your program, though:
First, you should never pass a variable string as the first argument to printf: either use puts, or use printf("%s", string). Passing a variable as a format string prevents compilers that support type-checking printf arguments from doing that verification, and it can transform into a serious vulnerability if users can control it.
Second, you should never use strcpy. Strcpy will happily overrun buffers, which is another major security vulnerability. For instance, if you wrote:
char foo[] = "foo";
strcpy(foo, "this string is waaaaaay too long");
return;
you will cause undefined behavior, your program would crash again, and you're opening the door to other serious vulnerabilities that you can avoid by specifying the size of the destination buffer.
AFAIK, there is actually no standard C function that will decently copy strings, but the least bad one would be strlcpy, which additionally requires a size argument.
Related
So I read that strcat function is to be used carefully as the destination string should be large enough to hold contents of its own and source string. And it was true for the following program that I wrote:
#include <stdio.h>
#include <string.h>
int main(){
char *src, *dest;
printf("Enter Source String : ");
fgets(src, 10, stdin);
printf("Enter destination String : ");
fgets(dest, 20, stdin);
strcat(dest, src);
printf("Concatenated string is %s", dest);
return 0;
}
But not true for the one that I wrote here:
#include <stdio.h>
#include <string.h>
int main(){
char src[11] = "Hello ABC";
char dest[15] = "Hello DEFGIJK";
strcat(dest, src);
printf("concatenated string %s", dest);
getchar();
return 0;
}
This program ends up adding both without considering that destination string is not large enough. Why is it so?
The strcat function has no way of knowing exactly how long the destination buffer is, so it assumes that the buffer passed to it is large enough. If it's not, you invoke undefined behavior by writing past the end of the buffer. That's what's happening in the second piece of code.
The first piece of code is also invalid because both src and dest are uninitialized pointers. When you pass them to fgets, it reads whatever garbage value they contain, treats it as a valid address, then tries to write values to that invalid address. This is also undefined behavior.
One of the things that makes C fast is that it doesn't check to make sure you follow the rules. It just tells you the rules and assumes that you follow them, and if you don't bad things may or may not happen. In your particular case it appeared to work but there's no guarantee of that.
For example, when I ran your second piece of code it also appeared to work. But if I changed it to this:
#include <stdio.h>
#include <string.h>
int main(){
char dest[15] = "Hello DEFGIJK";
strcat(dest, "Hello ABC XXXXXXXXXX");
printf("concatenated string %s", dest);
return 0;
}
The program crashes.
I think your confusion is not actually about the definition of strcat. Your real confusion is that you assumed that the C compiler would enforce all the "rules". That assumption is quite false.
Yes, the first argument to strcat must be a pointer to memory sufficient to store the concatenated result. In both of your programs, that requirement is violated. You may be getting the impression, from the lack of error messages in either program, that perhaps the rule isn't what you thought it was, that somehow it's valid to call strcat even when the first argument is not a pointer to enough memory. But no, that's not the case: calling strcat when there's not enough memory is definitely wrong. The fact that there were no error messages, or that one or both programs appeared to "work", proves nothing.
Here's an analogy. (You may even have had this experience when you were a child.) Suppose your mother tells you not to run across the street, because you might get hit by a car. Suppose you run across the street anyway, and do not get hit by a car. Do you conclude that your mother's advice was incorrect? Is this a valid conclusion?
In summary, what you read was correct: strcat must be used carefully. But let's rephrase that: you must be careful when calling strcat. If you're not careful, all sorts of things can go wrong, without any warning. In fact, many style guides recommend not using functions such as strcat at all, because they're so easy to misuse if you're careless. (Functions such as strcat can be used perfectly safely as long as you're careful -- but of course not all programmers are sufficiently careful.)
The strcat() function is indeed to be used carefully because it doesn't protect you from anything. If the source string isn't NULL-terminated, the destination string isn't NULL-terminated, or the destination string doesn't have enough space, strcat will still copy data. Therefore, it is easy to overwrite data you didn't mean to overwrite. It is your responsibility to make sure you have enough space. Using strncat() instead of strcat will also give you some extra safety.
Edit Here's an example:
#include <stdio.h>
#include <string.h>
int main()
{
char s1[16] = {0};
char s2[16] = {0};
strcpy(s2, "0123456789abcdefOOPS WAY TOO LONG");
/* ^^^ purposefully copy too much data into s2 */
printf("-%s-\n",s1);
return 0;
}
I never assigned to s1, so the output should ideally be --. However, because of how the compiler happened to arrange s1 and s2 in memory, the output I actually got was -OOPS WAY TOO LONG-. The strcpy(s2,...) overwrote the contents of s1 as well.
On gcc, -Wall or -Wstringop-overflow will help you detect situations like this one, where the compiler knows the size of the source string. However, in general, the compiler can't know how big your data will be. Therefore, you have to write code that makes sure you don't copy more than you have room for.
Both snippets invoke undefined behavior - the first because src and dest are not initialized to point anywhere meaningful, and the second because you are writing past the end of the array.
C does not enforce any kind of bounds checking on array accesses - you won't get an "Index out of range" exception if you try to write past the end of an array. You may get a runtime error if you try to access past a page boundary or clobber something important like the frame pointer, but otherwise you just risk corrupting data in your program.
Yes, you are responsible for making sure the target buffer is large enough for the final string. Otherwise the results are unpredictable.
I'd like to point out what is actually happening in the 2nd program in order to illustrate the problem.
It allocates 15 bytes at the memory location starting at dest and copies 14 bytes into it (including the null terminator):
char dest[15] = "Hello DEFGIJK";
...and 11 bytes at src with 10 bytes copied into it:
char src[11] = "Hello ABC";
The strcat() call then copies 10 bytes (9 chars plus the null terminator) from src into dest, starting right after the 'K' in dest. The resulting string at dest will be 23 bytes long including the null terminator. The problem is, you allocated only 15 bytes at dest, and the memory adjacent to that memory will be overwritten, i.e. corrupted, leading to program instability, wrong results, data corruption, etc.
Note that the strcat() function knows nothing about the amount of memory you've allocated at dest (or src, for that matter). It is up to you to make sure you've allocated enough memory at dest to prevent memory corruption.
By the way, the first program doesn't allocate memory at dest or src at all, so your calls to fgets() are corrupting memory starting at those locations.
I am bit confused when to allocate memory to a char * and when to point it to a const string.
Yes, I understand that if I wish to modify the string, I need to allocate it memory.
But in cases when I don't wish to modify the string to which I point and just need to pass the value should I just do the below? What are the disadvantages in the below steps as compared to allocating memory with malloc?
char *str = NULL;
str = "This is a test";
str = "Now I am pointing here";
Let's try again your example with the -Wwrite-strings compiler warning flag, you will see a warning:
warning: initialization discards 'const' qualifier from pointer target type
This is because the type of "This is a test" is const char *, not char *. So you are losing the constness information when you assign the literal address to the pointer.
For historical reasons, compilers will allow you to store string literals which are constants in non-const variables.
This is, however, a bad behavior and I suggest you to use -Wwrite-strings all the time.
If you want to prove it for yourself, try to modify the string:
char *str = "foo";
str[0] = 'a';
This program behavior is undefined but you may see a segmentation fault on many systems.
Running this example with Valgrind, you will see the following:
Process terminating with default action of signal 11 (SIGSEGV)
Bad permissions for mapped region at address 0x4005E4
The problem is that the binary generated by your compiler will store the string literals in a memory location which is read-only. By trying to write in it you cause a segmentation fault.
What is important to understand is that you are dealing here with two different systems:
The C typing system which is something to help you to write correct code and can be easily "muted" (by casting, etc.)
The Kernel memory page permissions which are here to protect your system and which shall always be honored.
Again, for historical reasons, this is a point where 1. and 2. do not agree. Or to be more clear, 1. is much more permissive than 2. (resulting in your program being killed by the kernel).
So don't be fooled by the compiler, the string literals you are declaring are really constant and you cannot do anything about it!
Considering your pointer str read and write is OK.
However, to write correct code, it should be a const char * and not a char *. With the following change, your example is a valid piece of C:
const char *str = "some string";
str = "some other string";
(const char * pointer to a const string)
In this case, the compiler does not emit any warning. What you write and what will be in memory once the code is executed will match.
Note: A const pointer to a const string being const char *const:
const char *const str = "foo";
The rule of thumb is: always be as constant as possible.
If you need to modify the string, use dynamic allocation (malloc() or better, some higher level string manipulation function such as strdup, etc. from the libc), if you don't need to, use a string literal.
If you know that str will always be read-only, why not declare it as such?
char const * str = NULL;
/* OR */
const char * str = NULL;
Well, actually there is one reason why this may be difficult - when you are passing the string to a read-only function that does not declare itself as such. Suppose you are using an external library that declares this function:
int countLettersInString(char c, char * str);
/* returns the number of times `c` occurs in `str`, or -1 if `str` is NULL. */
This function is well-documented and you know that it will not attempt to change the string str - but if you call it with a constant string, your compiler might give you a warning! You know there is nothing dangerous about it, but your compiler does not.
Why? Because as far as the compiler is concerned, maybe this function does try to modify the contents of the string, which would cause your program to crash. Maybe you rely very heavily on this library and there are lots of functions that all behave like this. Then maybe it's easier not to declare the string as const in the first place - but then it's all up to you to make sure you don't try to modify it.
On the other hand, if you are the one writing the countLettersInString function, then simply make sure the compiler knows you won't modify the string by declaring it with const:
int countLettersInString(char c, char const * str);
That way it will accept both constant and non-constant strings without issue.
One disadvantage of using string-literals is that they have length restrictions.
So you should keep in mind from the document ISO/IEC:9899
(emphasis mine)
5.2.4.1 Translation limits
1 The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:
[...]
— 4095 characters in a character string literal or wide string literal (after concatenation)
So If your constant text exceeds this count (What some times throughout may be possible, especially if you write a dynamic webserver in C) you are forbidden to use the string literal approach if you want to stay system independent.
There is no problem in your code as long as you are not planing to modify the contents of that string. Also, the memory for such string literals will remain for the full life time of the program. The memory allocated by malloc is read-write, so you can manipulate the contents of that memory.
If you have a string literal that you do not want to modify, what you are doing is ok:
char *str = NULL;
str = "This is a test";
str = "Now I am pointing here";
Here str a pointer has a memory which it points to. In second line you write to that memory "This is a test" and then again in 3 line you write in that memory "Now I am pointing here". This is legal in C.
You may find it a bit contradicting but you can't modify string that is something like this -
str[0]='X' // will give a problem.
However, if you want to be able to modify it, use it as a buffer to hold a line of input and so on, use malloc:
char *str=malloc(BUFSIZE); // BUFSIZE size what you want to allocate
free(str); // freeing memory
Use malloc() when you don't know the amount of memory needed during compile time.
It is legal in C unfortunately, but any attempt to modify the string literal via the pointer will result in undefined behavior.
Say
str[0] = 'Y'; //No compiler error, undefined behavior
It will run fine, but you may get a warning by the compiler, because you are pointing to a constant string.
P.S.: It will run OK only when you are not modifying it. So the only disadvantage of not using malloc is that you won't be able to modify it.
I came upon this weird behaviour when dealing with C strings. This is an exercise from the K&R book where I was supposed to write a function that appends one string onto the end of another string. This obviously requires the destination string to have enough memory allocated so that the source string fits. Here is the code:
/* strcat: Copies contents of source at the end of dest */
char *strcat(char *dest, const char* source) {
char *d = dest;
// Move to the end of dest
while (*dest != '\0') {
dest++;
} // *dest is now '\0'
while (*source != '\0') {
*dest++ = *source++;
}
*dest = '\0';
return d;
}
During testing I wrote the following, expecting a segfault to happen while the program is running:
int main() {
char s1[] = "hello";
char s2[] = "eheheheheheh";
printf("%s\n", strcat(s1, s2));
}
As far as I understand s1 gets an array of 6 chars allocated and s2 an array of 13 chars. I thought that when strcat tries to write to s1 at indexes higher than 6 the program would segfault. Instead everything works fine, but the program doesn't exit cleanly, instead it does:
helloeheheheheheh
zsh: abort ./a.out
and exits with code 134, which I think just means abort.
Why am I not getting a segfault (or overwriting s2 if the strings are allocated on the stack)? Where are these strings in memory (the stack, or the heap)?
Thanks for your help.
I thought that when strcat tries to write to s1 at indexes higher than 6 the program would segfault.
Writing outside the bounds of memory you have allocated on the stack is undefined behaviour. Invoking this undefined behaviour usually (but not always) results in a segfault. However, you can't be sure that a segfault will happen.
The wikipedia link explains it quite nicely:
When an instance of undefined behavior occurs, so far as the language specification is concerned anything could happen, maybe nothing at all.
So, in this case, you could get a segfault, the program could abort, or sometimes it could just run fine. Or, anything. There is no way of guaranteeing the result.
Where are these strings in memory (the stack, or the heap)?
Since you've declared them as char [] inside main(), they are arrays that have automatic storage, which for practical purposes means they're on the stack.
Edit 1:
I'm going to try and explain how you might go about discovering the answer for yourself. I'm not sure what actually happens as this is not defined behavior (as others have stated), but you can do some simple debugging to figure out what your compiler is actually doing.
Original Answer
My guess would be that they are both on the stack. You can check this by modifying your code with:
int main() {
char c1 = 'X';
char s1[] = "hello";
char s2[] = "eheheheheheh";
char c2 = '3';
printf("%s\n", strcat(s1, s2));
}
c1 and c2 are going to be on the stack. Knowing that you can check if s1 and s2 are as well.
If the address of c1 is less than s1 and the address of s1 is less than c2 then it is on the stack. Otherwise it is probably in your .bss section (which would be the smart thing to do but would break recursion).
The reason I'm banking on the strings being on the stack is that if you are modifying them in the function, and that function calls itself, then the second call would not have its own copy of the strings and hence would not be valid... However, the compiler still knows that this function isn't recursive and can put the strings in the .bss so I could be wrong.
Assuming my guess that it is on the stack is right, in your code
int main() {
char s1[] = "hello";
char s2[] = "eheheheheheh";
printf("%s\n", strcat(s1, s2));
}
"hello" (with the null terminator) is pushed onto the stack, followed by "eheheheheheh" (with the null terminator).
They are both located one after the other (thanks to plain luck of the order in which you wrote them) forming a single memory block that you can write to (but shouldn't!)... That's why there is no seg fault, you can see this by breaking before printf and looking at the addresses.
s2 == (uintptr_t)s1 + (strlen(s1) + 1) should be true if I'm right.
Modifying your code with
int main() {
char s1[] = "hello";
char c = '3';
char s2[] = "eheheheheheh";
printf("%s\n", strcat(s1, s2));
}
Should see c overwritten if I'm right...
However, if I'm wrong and it is in the .bss section then they could still be adjacent and you would be overwriting them without a seg fault.
If you really want to know, disassemble it:
Unfortunately I only know how to do it on Linux. Try using the nm <binary> > <text file>.txt command or objdump -t <your_binary> > <text file>.sym command to dump all the symbols from your program. The commands should also give you the section in which each symbol resides.
Search the file for the s1 and s2 symbols, if you don't find them it should mean that they are on the stack but we will check that in the next step.
Use the objdump -S your_binary > text_file.S command (make sure you built your binary with debug symbols) and then open the .S file in a text editor.
Again search for the s1 and s2 symbols, (hopefully there aren't any others, I suspect not but I'm not sure).
If you find their definitions followed by a push or sub %esp command, then they are on the stack. If you're unsure about what their definitions mean, post it back here and let us have a look.
There's no seg fault or even an overwrite because it can use the memory of the second string and still function. Even give the correct answer. The abort is a sign that the program realized something was wrong. Try reversing the order in which you declare the strings and try again. It probably won't be as pleasant.
int main() {
char s1[] = "hello";
char s2[] = "eheheheheheh";
printf("%s\n", strcat(s1, s2));
}
instead use:
int main() {
char s1[20] = "hello";
char s2[] = "eheheheheheh";
printf("%s\n", strcat(s1, s2));
}
Here is the reason why your program didn't crash:
Your strings are declared as array (s1[] and s2[]). So they're on the stack. And just so happens that memory for s2[] is right after s1[]. So when strcat() is called, all it does is moving each character in s2[] one byte forward. Stack as stack is readable and writable. So there is no restriction what you'e doing.
But I believe the compiler is free to locate s1[] and s2[] where it see fits so this is just a happy accident.
Now to get your program to crash is relatively easy
Swap s1 and s2 in your call: instead of strcat(s1, s2), do strcat(s2, s1). This should cause stack smashing exception.
Change s1[] and s2[] to *s1 and *s2. This should cause segfault when you're writing to readonly segment.
hmm.... the strings are in stack all right since heap is used only for dynamic allocation of memory and stuff..
segfault is for invalid memory access, but with this array you are just writing stuff which is going out of bound (outside the boundry) for the array , so while writing i dont think you will have a issue .... Since in C its actually left to the programer to ensure things are kept in bound for arrays.
Also while reading if you use pointers - I dont think there will be a issue either since you can just continue to read till where ever you want and using the sum of previous lengths. But if you use functions that are mentioned in string.h they relay on the presence of the null character "\0" to decide where to halt the operation -- hence i think your function worked !!
but the termination could also indicate that any other variable / something that might have been present next to the location of the strings might have got over written with char value .... accessing those might have caused the program to exit !!
hope this helps .... good question by the way !
I am trying to understand why the following snippet of code is giving a segmentation fault:
void tokenize(char* line)
{
char* cmd = strtok(line," ");
while (cmd != NULL)
{
printf ("%s\n",cmd);
cmd = strtok(NULL, " ");
}
}
int main(void)
{
tokenize("this is a test");
}
I know that strtok() does not actually tokenize on string literals, but in this case, line points directly to the string "this is a test" which is internally an array of char. Is there any of tokenizing line without copying it into an array?
The problem is that you're attempting to modify a string literal. Doing so causes your program's behavior to be undefined.
Saying that you're not allowed to modify a string literal is an oversimplification. Saying that string literals are const is incorrect; they're not.
WARNING : Digression follows.
The string literal "this is a test" is of an expression of type char[15] (14 for the length, plus 1 for the terminating '\0'). In most contexts, including this one, such an expression is implicitly converted to a pointer to the first element of the array, of type char*.
The behavior of attempting to modify the array referred to by a string literal is undefined -- not because it's const (it isn't), but because the C standard specifically says that it's undefined.
Some compilers might permit you to get away with this. Your code might actually modify the static array corresponding to the literal (which could cause great confusion later on).
Most modern compilers, though, will store the array in read-only memory -- not physical ROM, but in a region of memory that's protected from modification by the virtual memory system. The result of attempting to modify such memory is typically a segmentation fault and a program crash.
So why aren't string literals const? Since you really shouldn't try to modify them, it would certainly make sense -- and C++ does make string literals const. The reason is historical. The const keyword didn't exist before it was introduced by the 1989 ANSI C standard (though it was probably implemented by some compilers before that). So a pre-ANSI program might look like this:
#include <stdio.h>
print_string(s)
char *s;
{
printf("%s\n", s);
}
main()
{
print_string("Hello, world");
}
There was no way to enforce the fact that print_string isn't allowed to modify the string pointed to by s. Making string literals const in ANSI C would have broken existing code, which the ANSI C committee tried very hard to avoid doing. There hasn't been a good opportunity since then to make such a change to the language. (The designers of C++, mostly Bjarne Stroustrup, weren't as concerned about backward compatibility with C.)
There's a very good reason that trying to tokenize a compile-time constant string will cause a segmentation fault: the constant string is in read-only memory.
The C compiler bakes compile-time constant strings into the executable, and the operating system loads them into read-only memory (.rodata in a *nix ELF file). Since this memory is marked as read-only, and since strtok writes into the string that you pass into it, you get a segmentation fault for writing into read-only memory.
As you said, you can't modify a string literal, which is what strtok does. You have to do
char str[] = "this is a test";
tokenize(str);
This creates the array str and initialises it with this is a test\0, and passes a pointer to it to tokenize.
Strok modifies its first argument in order to tokenize it. Hence you can't pass it a literal string, as it's of type const char * and cannot be modified, hence the undefined behaviour. You have to copy the string literal into a char array that can be modified.
What point are you trying to make by your "...is internally an array of char" remark?
The fact that "this is a test" is internally an array of char does not change anything at all. It is still a string literal (all string literals are non-modifiable arrays of char). Your strtok still tries to tokenize a string literal. This is why it crashes.
I'm sure you'll get beaten up about this... but "strtok()" is inherently unsafe and prone to things like access violations.
Here, the answer is almost certainly using a string constant.
Try this instead:
void tokenize(char* line)
{
char* cmd = strtok(line," ");
while (cmd != NULL)
{
printf ("%s\n",cmd);
cmd = strtok(NULL, " ");
}
}
int main(void)
{
char buff[80];
strcpy (buff, "this is a test");
tokenize(buff);
}
I have also big trouble with this error.
I found a simple solution.
please include <string.h>
it will remove strtok segmentation fault error.
I just hit the Segmentation Fault error from trying to use printf to print the token (cmd in your case) after it became NULL.
I have used this line of code many times (update: when string was a parameter to the function!), however when I try to do it now I get a bus error (both with gcc and clang). I am reproducing the simplest possible code;
char *string = "this is a string";
char *p = string;
p++;
*p='x'; //this line will cause the Bus error
printf("string is %s\n",string);
Why am I unable to change the second character of the string using the p pointer?
You are trying to modify read only memory (where that string literal is stored). You can use a char array instead if you need to modify that memory.
char str[] = "This is a string";
str[0] = 'S'; /* works */
I have used this line of code many times..
I sure hope not. At best you would get a segfault (I say "at best" because attempting to modify readonly memory is unspecified behavior, in which case anything can happen, and a crash is the best thing that can happen).
When you declare a pointer to a string literal it points to read only memory in the data segment (look at the assembly output if you like). Declaring your type as a char[] will copy that literal onto the function's stack, which will in turn allow it to be modified if needed.