This question already has an answer here:
Is modification of string literals undefined behaviour according to the C89 standard?
(1 answer)
Closed 7 years ago.
So I've been playing around with C lately and have been trying to understand the intricacies of passing by value/reference and the ability to manipulate a passed-in variable inside a function. I've hit a road block, however, with the following:
void modifyCharArray(char *input)
{
//change input[0] to 'D'
input[0] = 'D';
}
int main()
{
char *test = "Bad";
modifyCharArray(test);
printf("Bad --> %s\n", test);
}
So the idea was to just modify a char array inside a function, and then print out said array after the modification completed. However, this fails, since all I'm doing is modifying the value of input that is passed in and not the actual memory address.
In short, is there any way I can take in a char *input into a function and modify its original memory address without using something like memcpy from string.h?
In short, is there any way I can take in a char *input into a function and modify its original memory address without using something like memcpy from string.h?
Yes, you can. Your function modifyCharArray is doing the right thing. What you are seeing is caused by that fact that
char *test = "Bad";
creates "Bad" in read only memory of the program and test points to that memory. Changing it is cause for undefined behavior.
If you want to create a modifiable string, use:
char test[] = "Bad";
Related
This question already has an answer here:
Returning Local Variable Pointers - C [duplicate]
(1 answer)
Closed 7 years ago.
I want to know why doesn't the following work correctly? Though I have tried the other ways that work, but for the sake of more clarity I would like to know the problem occurring here.
char *fuc(char *s)
{
char t[10];
int r=0;
while(s[r] != '\0')
{
t[r] = s[r];
r++;
}
t[r]='\0';
return &t[0];
}
main()
{
char s[]="abcde";
char *p;
p=func(s);
puts(p);
}
In your fuc(), char t[10]; is a local variable. Once your function finishes execution, there is no existence of t. So, in the caller, the returned pointer becomes invalid.
Using that returned pointer further leads to undefined behaviour.
If you want to return the pointer from fuc(), you need to make use of dynamic memory allocation function , like malloc() and family. In that case, inside the caller function, once you're done using the memory, you need to take care for free()-ing the allocated memory, too.
That said, from the logical point of view, inside your fuc(), you're iterating over t without any check on the bounds. You should check for the size of t before using the index.
Furthermore, main() is not a proper form of the function. At Least, it should be int main(void).
Array t is local to function func() so once you exit the function you can't access array t which will lead to undefined behavior.
You should allocate memory on heap.
char *t = malloc(10);
Now you can return a pointer from the function func()
This question already has answers here:
Returning an array using C
(8 answers)
Closed 8 years ago.
I have an idea on dangling pointer. I know that the following program will produce a dangling pointer.But I couldnt understand the output of the program
char *getString()
{
char str[] = "Stack Overflow ";
return str;
}
int main()
{
char *s=getString();
printf("%c\n",s[1]);
printf("%s",s); // Statement -1
printf("%s\n",s); // Statement -2
return 0;
}
The output of the following program is
t
if only Statement-1 is there then output is some grabage values
if only Statement-2 is there then output is new line
Your code shows undefined behaviour, as you're returning the address of a local variable.
There is no existence of str once the getString() function has finished execution and returned.
As for the question,
if only Statement-1 is there then output is some grabage values if only Statement-2 is there then output is new line
No explanations. Once your program exhibits undefined behaviour, the output cannot be predicted, that's all. [who knows, it might print your cell phone number, too, or a daemon may fly out of my nose]
For simple logical part, adding a \n in printf() will cause the output buffer to be flushed to the output immediately. [Hint: stdout is line buffered.]
Solution:
You can do your job either of the two ways stated below
Take a pointer, allocate memory dynamically inside getString() and return the pointer. (I'd recommend this). Also, free() it later in main() once you're done.
make the char str[] static so that the scope is not limited to the lifetime of the function. (not so good, but still will do the job)
your str in getString is a local variable, which is allocate on stack, and when the function returns, it doesn't exist anymore.
I suggest you rewrite getString() like this
char *getString()
{
char str[] = "Stack Overflow ";
char *tmp = (char*)malloc(sizeof(char)*strlen(str));
memcpy(tmp, str, strlen(str));
return tmp;
}
and you need to add
free(s);
before return 0;
In my case, pointer tmp points to a block memory on heap, which will exist till your program ends
you need to know more about stack and heap
Besides, there is still another way, use static variable instead
char *getString()
{
static char str[] = "Stack Overflow ";
return str;
}
PS: You get the correct answer for the following statement printf("%c\n",s[1]); is just a coincidence. Opera System didn't have time to do some clean work when you return from function. But it will
Array is returned as a pointer yet the array itself is the garbage after return from function. Just use static modifier.
What's concerning s[1] is OK. The point is, it's the first printf after getting the dangling pointer. So, the stack at this point is still (probably) intact. You should recall that stack is used for function calls and local variables only (in DOS it could be used by system interrupts, but now it's not the case). So, before the first printf (when s[1] is calc'ed), s[] is OK, but after - it's not (printf' code had messed it up). I hope, now it's clear.
consider the following code:
t[7] = "Hellow\0";
s[3] = "Dad";
//now copy t to s using the following strcpy function:
void strcpy(char *s, char *t) {
int i = 0;
while ((s[i] = t[i]) != '\0')
i++;
}
the above code is taken from "The C programming Language book".
my question is - we are copying 7 bytes to what was declared as 3 bytes.
how do I know that after copying, other data that was after s[] in the memory
wasn't deleted?
and one more question please: char *s is identical to char* s?
Thank you !
As you correctly point out, passing s[3] as the first argument is going to overwrite some memory that could well be used by something else. At best your program will crash right there and then; at worst, it will carry on running, damaged, and eventually end up corrupting something it was supposed to handle.
The intended way to do this in C is to never pass an array shorter than required.
By the way, it looks like you've swapped s and t; what was meant was probably this:
void strcpy(char *t, char *s) {
int i = 0;
while ((t[i] = s[i]) != '\0')
i++;
}
You can now copy s[4] into t[7] using this amended strcpy routine:
char t[] = "Hellow";
char s[] = "Dad";
strcpy(t, s);
(edit: the length of s is now fixed)
About the first question.
If you're lucky your program will crash.
If you are not it will keep on running and overwrite memory areas that shouldn't be touched (as you don't know what's actually in there). This would be a hell to debug...
About the second question.
Both char* s and char *s do the same thing. It's just a matter of style.
That is, char* s could be interpreted as "s is of type char pointer" while char *s could be interpreted as "s is a pointer to a char". But really, syntactically it's the same.
That example does nothing, you're not invoking strcpy yet. But if you did this:
strcpy(s,t);
It would be wrong in several ways:
The string s is not null terminated. In C the only way strcpy can know where a string ends is by finding the '\0'. The function may think that s is infinite and it might corrupt your memory and make the program crash.
Even if was null terminated, as you said the size of s is only 3. Because of the same cause, strcpy would write memory beyond where s ends, with maybe catastrophic results.
The workaround for this in C is the function strncpy(dst, src, max) in which you specify the maximum number of chars to copy. Still beware that this function might generate a not null terminated string if src is shorter than max chars.
I will assume that both s and t (above the function definition) are arrays of char.
how do I know that after copying, other data that was after s[] in the memory wasn't deleted?
No, this is worse, you are invoking undefined behavior and we know this because the standard says so. All you are allowed to do after the three elements in s is compare. Assignment is a strict no-no. Advance further, and you're not even allowed to compare.
and one more question please: char s is identical to char s?
In most cases it is a matter of style where you stick your asterix except if you are going to declare/define more than one, in which case you need to stick one to every variable you are going to name (as a pointer).
a string-literal "Hellow\0" is equal to "Hellow"
if you define
t[7] = "Hellow";
s[7] = "Dad";
your example is defined and crashes not.
char *strtok(char *s1, const char *s2)
repeated calls to this function break string s1 into "tokens"--that is
the string is broken into substrings,
each terminating with a '\0', where
the '\0' replaces any characters
contained in string s2. The first call
uses the string to be tokenized as s1;
subsequent calls use NULL as the first
argument. A pointer to the beginning
of the current token is returned; NULL
is returned if there are no more
tokens.
Hi,
I have been trying to use strtok just now and found out that if I pass in a char* into s1, I get a segmentation fault. If I pass in a char[], strtok works fine.
Why is this?
I googled around and the reason seems to be something about how char* is read only and char[] is writeable. A more thorough explanation would be much appreciated.
What did you initialize the char * to?
If something like
char *text = "foobar";
then you have a pointer to some read-only characters
For
char text[7] = "foobar";
then you have a seven element array of characters that you can do what you like with.
strtok writes into the string you give it - overwriting the separator character with null and keeping a pointer to the rest of the string.
Hence, if you pass it a read-only string, it will attempt to write to it, and you get a segfault.
Also, becasue strtok keeps a reference to the rest of the string, it's not reeentrant - you can use it only on one string at a time. It's best avoided, really - consider strsep(3) instead - see, for example, here: http://www.rt.com/man/strsep.3.html (although that still writes into the string so has the same read-only/segfault issue)
An important point that's inferred but not stated explicitly:
Based on your question, I'm guessing that you're fairly new to programming in C, so I'd like to explain a little more about your situation. Forgive me if I'm mistaken; C can be hard to learn mostly because of subtle misunderstanding in underlying mechanisms so I like to make things as plain as possible.
As you know, when you write out your C program the compiler pre-creates everything for you based on the syntax. When you declare a variable anywhere in your code, e.g.:
int x = 0;
The compiler reads this line of text and says to itself: OK, I need to replace all occurrences in the current code scope of x with a constant reference to a region of memory I've allocated to hold an integer.
When your program is run, this line leads to a new action: I need to set the region of memory that x references to int value 0.
Note the subtle difference here: the memory location that reference point x holds is constant (and cannot be changed). However, the value that x points can be changed. You do it in your code through assignment, e.g. x = 15;. Also note that the single line of code actually amounts to two separate commands to the compiler.
When you have a statement like:
char *name = "Tom";
The compiler's process is like this: OK, I need to replace all occurrences in the current code scope of name with a constant reference to a region of memory I've allocated to hold a char pointer value. And it does so.
But there's that second step, which amounts to this: I need to create a constant array of characters which holds the values 'T', 'o', 'm', and NULL. Then I need to replace the part of the code which says "Tom" with the memory address of that constant string.
When your program is run, the final step occurs: setting the pointer to char's value (which isn't constant) to the memory address of that automatically created string (which is constant).
So a char * is not read-only. Only a const char * is read-only. But your problem in this case isn't that char *s are read-only, it's that your pointer references a read-only regions of memory.
I bring all this up because understanding this issue is the barrier between you looking at the definition of that function from the library and understanding the issue yourself versus having to ask us. And I've somewhat simplified some of the details in the hopes of making the issue more understandable.
I hope this was helpful. ;)
I blame the C standard.
char *s = "abc";
could have been defined to give the same error as
const char *cs = "abc";
char *s = cs;
on grounds that string literals are unmodifiable. But it wasn't, it was defined to compile. Go figure. [Edit: Mike B has gone figured - "const" didn't exist at all in K&R C. ISO C, plus every version of C and C++ since, has wanted to be backward-compatible. So it has to be valid.]
If it had been defined to give an error, then you couldn't have got as far as the segfault, because strtok's first parameter is char*, so the compiler would have prevented you passing in the pointer generated from the literal.
It may be of interest that there was at one time a plan in C++ for this to be deprecated (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1996/N0896.asc). But 12 years later I can't persuade either gcc or g++ to give me any kind of warning for assigning a literal to non-const char*, so it isn't all that loudly deprecated.
[Edit: aha: -Wwrite-strings, which isn't included in -Wall or -Wextra]
In brief:
char *s = "HAPPY DAY";
printf("\n %s ", s);
s = "NEW YEAR"; /* Valid */
printf("\n %s ", s);
s[0] = 'c'; /* Invalid */
If you look at your compiler documentation, odds are there is a option you can set to make those strings writable.
Here's a simple example of a program that concatenates two strings.
#include <stdio.h>
void strcat(char *s, char *t);
void strcat(char *s, char *t) {
while (*s++ != '\0');
s--;
while ((*s++ = *t++) != '\0');
}
int main() {
char *s = "hello";
strcat(s, " world");
while (*s != '\0') {
putchar(*s++);
}
return 0;
}
I'm wondering why it works. In main(), I have a pointer to the string "hello". According to the K&R book, modifying a string like that is undefined behavior. So why is the program able to modify it by appending " world"? Or is appending not considered as modifying?
Undefined behavior means a compiler can emit code that does anything. Working is a subset of undefined.
I +1'd MSN, but as for why it works, it's because nothing has come along to fill the space behind your string yet. Declare a few more variables, add some complexity, and you'll start to see some wackiness.
Perhaps surprisingly, your compiler has allocated the literal "hello" into read/write initialized data instead of read-only initialized data. Your assignment clobbers whatever is adjacent to that spot, but your program is small and simple enough that you don't see the effects. (Put it in a for loop and see if you are clobbering the " world" literal.)
It fails on Ubuntu x64 because gcc puts string literals in read-only data, and when you try to write, the hardware MMU objects.
You were lucky this time.
Especially in debug mode some compilers will put spare memory (often filled with some obvious value) around declarations so you can find code like this.
It also depends on the how the pointer is declared. For example, can change ptr, and what ptr points to:
char * ptr;
Can change what ptr points to, but not ptr:
char const * ptr;
Can change ptr, but not what ptr points to:
const char * ptr;
Can't change anything:
const char const * ptr;
According to the C99 specifification (C99: TC3, 6.4.5, §5), string literals are
[...] used to initialize an array of static storage duration and length just
sufficient to contain the sequence. [...]
which means they have the type char [], ie modification is possible in principle. Why you shouldn't do it is explained in §6:
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
Different string literals with the same contents may - but don't have to - be mapped to the same memory location. As the behaviour is undefined, compilers are free to put them in read-only sections in order to cleanly fail instead of introducing possibly hard to detect error sources.
I'm wondering why it works
It doesn't. It causes a Segmentation Fault on Ubuntu x64; for code to work it shouldn't just work on your machine.
Moving the modified data to the stack gets around the data area protection in linux:
int main() {
char b[] = "hello";
char c[] = " ";
char *s = b;
strcat(s, " world");
puts(b);
puts(c);
return 0;
}
Though you then are only safe as 'world' fits in the unused spaces between stack data - change b to "hello to" and linux detects the stack corruption:
*** stack smashing detected ***: bin/clobber terminated
The compiler is allowing you to modify s because you have improperly marked it as non-const -- a pointer to a static string like that should be
const char *s = "hello";
With the const modifier missing, you've basically disabled the safety that prevents you from writing into memory that you shouldn't write into. C does very little to keep you from shooting yourself in the foot. In this case you got lucky and only grazed your pinky toe.
s points to a bit of memory that holds "hello", but was not intended to contain more than that. This means that it is very likely that you will be overwriting something else. That is very dangerous, even though it may seem to work.
Two observations:
The * in *s-- is not necessary. s-- would suffice, because you only want to decrement the value.
You don't need to write strcat yourself. It already exists (you probably knew that, but I'm telling you anyway:-)).