#include<string.h>
#include<stdio.h>
void main()
{
char *str1="hello";
char *str2="world";
strcat(str2,str1);
printf("%s",str2);
}
If I run this program, I'm getting run time program termination.
Please help me.
If I use this:
char str1[]="hello";
char str2[]="world";
then it is working!
But why
char *str1="hello";
char *str2="world";
this code is not working????
You are learning from a bad book. The main function should be declared as
int main (void);
Declaring it as void invokes undefined behaviour when the application finishes. Well, it doesn't finish yet, but eventually it will.
Get a book about the C language. You would find that
char *srt1="hello";
is compiled as if you wrote
static const char secret_array [6] = { 'h', 'e', 'l', 'l', 'o', 0 };
char* srt1 = (char*) &secret_array [0];
while
char srt1[]="hello";
is compiled as if you wrote
char srt1 [6] = { 'h', 'e', 'l', 'l', 'o', 0 };
Both strcat calls are severe bugs, because the destination of the strcat call doesn't have enough memory to contain the result. The first call is also a bug because you try to modify constant memory. In the first case, the bug leads to a crash, which is a good thing and lucky for you. In the second case the bug isn't immediately detected. Which is bad luck. You can bet that if you use code like this in a program that is shipped to a customer, it will crash if you are lucky, and lead to incorrect results that will cost your customer lots of money and get you sued otherwise.
In your code.
char *srt1="hello";
you created a pointer and pointed it at a constant string. The compiler puts that in a part of memory that is marked as read-only.
So strcat will try to modifying that which cause undefined behaviour.
it has no name and has static storage duration (meaning that it lives for the entire life of the program); and a variable of type pointer-to-char, called p, which is initialized with the location of the first character in that unnamed, read-only array.
see my answer for proper understanding why it working for first and not for second case.
String literals are read-only, you cannot change them.
This:
char *srt2="world";
means srt2 (bad name, btw) is a pointer variable, pointing at memory containing the constant data "world" (and a terminating '\0' character). There's no additional room after the six characters, and you cannot even change the letters.
You need:
char str2[32] = "world";
This, on the other hand, makes str2 be an array of 32 characters, where the first 6 characters are initialized to "world" and a terminating '\0'. It's fine to append to this, since new characters can fit into the existing array, as long as you don't overstep and try to store more than 32 characters (including the terminator).
char *srt1="hello";
char *srt2="world";
when you declare string like this, it is stored in Read-only Memory!
You can't change the variables stored in Read-only memory!
When you do strcat it is trying to modify the string, which is present in read only memory. So it is not allowed here! It is "Undefined".
In your program, str1 and str2 are not supposed to be written to because they're declared as pointer to read-only memory. To see this, do 'objdump -s a.out' and you'll see the following:
Contents of section .rodata:
400640 01000200 68656c6c 6f00776f 726c6400 ....hello.world.
400650 257300 %s.
strcat tries to write to that part of memory, thus causing a segmentation fault.
You can also use an Array. Here is a simple program of mine where I faced a problem like this, but in this problem you have to declare the size of array. Like,
mahe = [10].
#include <stdio.h>
#include <stdlib.h>
int main()
{
char cname[10] = "mahe";
strcat(cname, "Karim");
printf("%s\n", cname);
return 0;
}
Related
I was making a basic program of strings and did this. There is a string in this way:
#include<stdio.h>
int main()
{
char str[7]="network";
printf("%s",str);
return 0;
}
It prints network.In my view, it should not print network. Some garbage value should be printed because '\0' does not end this character array. So how it got printed? There were no warning or errors too.
That's because
char str[7]="network";
is the same as
char str[7]={'n','e','t','w','o','r','k'};
str is a valid char array, but not a string, because it's no null-terminated. So it's undefined behavior to use %s to print it.
Reference: C FAQ: Is char a[3] = "abc"; legal? What does it mean?
char str[7]="network";
This Invokes Undefined behavior.
You did not declared array with enough space
This should be
char str[8]="network";
char str[7]="network";
you did not provide enough space for the string
char str[8]="network";
It's possible that stack pages start off completely zeroed in your system, so the string is actually null-terminated in memory, but not thanks to your code.
Try looking at the program in memory using a debugger, reading your platform documentation or printing out the contents of str[7] to get some clues. Doing so invokes undefined behavior but it's irrelevant when you're trying to figure out what your specific compiler and OS are doing at one given point in time.
This is my simple program:
int main(int argc, const char * argv[])
{
char s [5] = "Hello";
printf("%s", s);
return 0;
}
Hello is a 5 chars length so I define a char s [5] but the output is:
Hellop\370\277_\377
And when I change the char s [5] to char s [], everything works fine. What's the problem with my code?
You're forgetting the null terminator.
C strings are expected to have one \0 byte on the end to mark the end of the string. The reason your output looks strange is because printf, looking for the null terminator, has wandered off into uninitialized memory, which triggers undefined behavior.
In this case, printf appears to somewhat luckily find a null pretty quickly, and terminate normally after printing some garbage. However, this kind of bug will often crash your program with the message segmentation fault. A "seg fault" occurs when the operating system kills your process because it's doing something it's not supposed to, like reading memory that doesn't belong to it.
Try this instead:
char s[] = "Hello"; //s is now 6 characters long.
By not providing a number, the compiler decides how big your array needs to be and copies the "Hello" data into it.
If you need a string you don't want to change, you should declare them this way instead:
const char* s = "Hello";
This way, you've created a pointer that points to static memory, containing the string "Hello", no copy needed.
consider the following code:
t[7] = "Hellow\0";
s[3] = "Dad";
//now copy t to s using the following strcpy function:
void strcpy(char *s, char *t) {
int i = 0;
while ((s[i] = t[i]) != '\0')
i++;
}
the above code is taken from "The C programming Language book".
my question is - we are copying 7 bytes to what was declared as 3 bytes.
how do I know that after copying, other data that was after s[] in the memory
wasn't deleted?
and one more question please: char *s is identical to char* s?
Thank you !
As you correctly point out, passing s[3] as the first argument is going to overwrite some memory that could well be used by something else. At best your program will crash right there and then; at worst, it will carry on running, damaged, and eventually end up corrupting something it was supposed to handle.
The intended way to do this in C is to never pass an array shorter than required.
By the way, it looks like you've swapped s and t; what was meant was probably this:
void strcpy(char *t, char *s) {
int i = 0;
while ((t[i] = s[i]) != '\0')
i++;
}
You can now copy s[4] into t[7] using this amended strcpy routine:
char t[] = "Hellow";
char s[] = "Dad";
strcpy(t, s);
(edit: the length of s is now fixed)
About the first question.
If you're lucky your program will crash.
If you are not it will keep on running and overwrite memory areas that shouldn't be touched (as you don't know what's actually in there). This would be a hell to debug...
About the second question.
Both char* s and char *s do the same thing. It's just a matter of style.
That is, char* s could be interpreted as "s is of type char pointer" while char *s could be interpreted as "s is a pointer to a char". But really, syntactically it's the same.
That example does nothing, you're not invoking strcpy yet. But if you did this:
strcpy(s,t);
It would be wrong in several ways:
The string s is not null terminated. In C the only way strcpy can know where a string ends is by finding the '\0'. The function may think that s is infinite and it might corrupt your memory and make the program crash.
Even if was null terminated, as you said the size of s is only 3. Because of the same cause, strcpy would write memory beyond where s ends, with maybe catastrophic results.
The workaround for this in C is the function strncpy(dst, src, max) in which you specify the maximum number of chars to copy. Still beware that this function might generate a not null terminated string if src is shorter than max chars.
I will assume that both s and t (above the function definition) are arrays of char.
how do I know that after copying, other data that was after s[] in the memory wasn't deleted?
No, this is worse, you are invoking undefined behavior and we know this because the standard says so. All you are allowed to do after the three elements in s is compare. Assignment is a strict no-no. Advance further, and you're not even allowed to compare.
and one more question please: char s is identical to char s?
In most cases it is a matter of style where you stick your asterix except if you are going to declare/define more than one, in which case you need to stick one to every variable you are going to name (as a pointer).
a string-literal "Hellow\0" is equal to "Hellow"
if you define
t[7] = "Hellow";
s[7] = "Dad";
your example is defined and crashes not.
char *strtok(char *s1, const char *s2)
repeated calls to this function break string s1 into "tokens"--that is
the string is broken into substrings,
each terminating with a '\0', where
the '\0' replaces any characters
contained in string s2. The first call
uses the string to be tokenized as s1;
subsequent calls use NULL as the first
argument. A pointer to the beginning
of the current token is returned; NULL
is returned if there are no more
tokens.
Hi,
I have been trying to use strtok just now and found out that if I pass in a char* into s1, I get a segmentation fault. If I pass in a char[], strtok works fine.
Why is this?
I googled around and the reason seems to be something about how char* is read only and char[] is writeable. A more thorough explanation would be much appreciated.
What did you initialize the char * to?
If something like
char *text = "foobar";
then you have a pointer to some read-only characters
For
char text[7] = "foobar";
then you have a seven element array of characters that you can do what you like with.
strtok writes into the string you give it - overwriting the separator character with null and keeping a pointer to the rest of the string.
Hence, if you pass it a read-only string, it will attempt to write to it, and you get a segfault.
Also, becasue strtok keeps a reference to the rest of the string, it's not reeentrant - you can use it only on one string at a time. It's best avoided, really - consider strsep(3) instead - see, for example, here: http://www.rt.com/man/strsep.3.html (although that still writes into the string so has the same read-only/segfault issue)
An important point that's inferred but not stated explicitly:
Based on your question, I'm guessing that you're fairly new to programming in C, so I'd like to explain a little more about your situation. Forgive me if I'm mistaken; C can be hard to learn mostly because of subtle misunderstanding in underlying mechanisms so I like to make things as plain as possible.
As you know, when you write out your C program the compiler pre-creates everything for you based on the syntax. When you declare a variable anywhere in your code, e.g.:
int x = 0;
The compiler reads this line of text and says to itself: OK, I need to replace all occurrences in the current code scope of x with a constant reference to a region of memory I've allocated to hold an integer.
When your program is run, this line leads to a new action: I need to set the region of memory that x references to int value 0.
Note the subtle difference here: the memory location that reference point x holds is constant (and cannot be changed). However, the value that x points can be changed. You do it in your code through assignment, e.g. x = 15;. Also note that the single line of code actually amounts to two separate commands to the compiler.
When you have a statement like:
char *name = "Tom";
The compiler's process is like this: OK, I need to replace all occurrences in the current code scope of name with a constant reference to a region of memory I've allocated to hold a char pointer value. And it does so.
But there's that second step, which amounts to this: I need to create a constant array of characters which holds the values 'T', 'o', 'm', and NULL. Then I need to replace the part of the code which says "Tom" with the memory address of that constant string.
When your program is run, the final step occurs: setting the pointer to char's value (which isn't constant) to the memory address of that automatically created string (which is constant).
So a char * is not read-only. Only a const char * is read-only. But your problem in this case isn't that char *s are read-only, it's that your pointer references a read-only regions of memory.
I bring all this up because understanding this issue is the barrier between you looking at the definition of that function from the library and understanding the issue yourself versus having to ask us. And I've somewhat simplified some of the details in the hopes of making the issue more understandable.
I hope this was helpful. ;)
I blame the C standard.
char *s = "abc";
could have been defined to give the same error as
const char *cs = "abc";
char *s = cs;
on grounds that string literals are unmodifiable. But it wasn't, it was defined to compile. Go figure. [Edit: Mike B has gone figured - "const" didn't exist at all in K&R C. ISO C, plus every version of C and C++ since, has wanted to be backward-compatible. So it has to be valid.]
If it had been defined to give an error, then you couldn't have got as far as the segfault, because strtok's first parameter is char*, so the compiler would have prevented you passing in the pointer generated from the literal.
It may be of interest that there was at one time a plan in C++ for this to be deprecated (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1996/N0896.asc). But 12 years later I can't persuade either gcc or g++ to give me any kind of warning for assigning a literal to non-const char*, so it isn't all that loudly deprecated.
[Edit: aha: -Wwrite-strings, which isn't included in -Wall or -Wextra]
In brief:
char *s = "HAPPY DAY";
printf("\n %s ", s);
s = "NEW YEAR"; /* Valid */
printf("\n %s ", s);
s[0] = 'c'; /* Invalid */
If you look at your compiler documentation, odds are there is a option you can set to make those strings writable.
Here's a simple example of a program that concatenates two strings.
#include <stdio.h>
void strcat(char *s, char *t);
void strcat(char *s, char *t) {
while (*s++ != '\0');
s--;
while ((*s++ = *t++) != '\0');
}
int main() {
char *s = "hello";
strcat(s, " world");
while (*s != '\0') {
putchar(*s++);
}
return 0;
}
I'm wondering why it works. In main(), I have a pointer to the string "hello". According to the K&R book, modifying a string like that is undefined behavior. So why is the program able to modify it by appending " world"? Or is appending not considered as modifying?
Undefined behavior means a compiler can emit code that does anything. Working is a subset of undefined.
I +1'd MSN, but as for why it works, it's because nothing has come along to fill the space behind your string yet. Declare a few more variables, add some complexity, and you'll start to see some wackiness.
Perhaps surprisingly, your compiler has allocated the literal "hello" into read/write initialized data instead of read-only initialized data. Your assignment clobbers whatever is adjacent to that spot, but your program is small and simple enough that you don't see the effects. (Put it in a for loop and see if you are clobbering the " world" literal.)
It fails on Ubuntu x64 because gcc puts string literals in read-only data, and when you try to write, the hardware MMU objects.
You were lucky this time.
Especially in debug mode some compilers will put spare memory (often filled with some obvious value) around declarations so you can find code like this.
It also depends on the how the pointer is declared. For example, can change ptr, and what ptr points to:
char * ptr;
Can change what ptr points to, but not ptr:
char const * ptr;
Can change ptr, but not what ptr points to:
const char * ptr;
Can't change anything:
const char const * ptr;
According to the C99 specifification (C99: TC3, 6.4.5, §5), string literals are
[...] used to initialize an array of static storage duration and length just
sufficient to contain the sequence. [...]
which means they have the type char [], ie modification is possible in principle. Why you shouldn't do it is explained in §6:
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
Different string literals with the same contents may - but don't have to - be mapped to the same memory location. As the behaviour is undefined, compilers are free to put them in read-only sections in order to cleanly fail instead of introducing possibly hard to detect error sources.
I'm wondering why it works
It doesn't. It causes a Segmentation Fault on Ubuntu x64; for code to work it shouldn't just work on your machine.
Moving the modified data to the stack gets around the data area protection in linux:
int main() {
char b[] = "hello";
char c[] = " ";
char *s = b;
strcat(s, " world");
puts(b);
puts(c);
return 0;
}
Though you then are only safe as 'world' fits in the unused spaces between stack data - change b to "hello to" and linux detects the stack corruption:
*** stack smashing detected ***: bin/clobber terminated
The compiler is allowing you to modify s because you have improperly marked it as non-const -- a pointer to a static string like that should be
const char *s = "hello";
With the const modifier missing, you've basically disabled the safety that prevents you from writing into memory that you shouldn't write into. C does very little to keep you from shooting yourself in the foot. In this case you got lucky and only grazed your pinky toe.
s points to a bit of memory that holds "hello", but was not intended to contain more than that. This means that it is very likely that you will be overwriting something else. That is very dangerous, even though it may seem to work.
Two observations:
The * in *s-- is not necessary. s-- would suffice, because you only want to decrement the value.
You don't need to write strcat yourself. It already exists (you probably knew that, but I'm telling you anyway:-)).