I have a question about how is the correct way of manipulate the initialization of c strings
For example the next code, isn't always correct.
char *something;
something = "zzzzzzzzzzzzzzzzzz";
i test a little incrementing the number of zetas and effectively the program crash in like about two lines, so what is the real size limit in this char array? how can i be sure that it is not going to crash, is this limit implementation dependent? Is the following code the correct approach that i always must use?
char something[FIXEDSIZE];
strcpy(something, "zzzzzzzzzzzzzzzzzzz");
As you say, manipulating this string leads to undefined behaviour:
char *something;
something = "zzzzzzzzzzzzzzzzzz";
If you are curious as to why, see "C String literals: Where do they go?".
If you plan to manipulate your string at all, (i.e. if you want it to be mutable) you should use this:
char something[] = "skjdghskfjhgfsj";
Otherwise, simply declare your char * as a const char * to indicate that it points to a constant.
In the second example, the compiler will be smart enough to declare this as an array on the stack of the exact size to hold the string. Thus, the size of this is limited by your stack.
Of course, you will likely want to specify the size anyway, since it is usually useful to know when manipulating the string.
The second is always correct.
The first is correct only if you never change the string, since you've assigned a pointer to fixed data.
The first example is only incorrect in that char *something should really be const char *something. Otherwise, this:
const char *something = "fooooooooooooooooooooooobar";
...should work, and should not crash.
char something[FIXEDSIZE];
...this one, however, can typically crash with a stack overflow if you, well, overflow the stack, which depends on how big that stack is, how big that array is, where this gets called, etc.
first should never crash. second will crash as soon as the number of 'z' + 1 go over the available space on the stack page, or if you try to return from the function.
Related
char * a = (char *) malloc(10);
strcpy(a,"string1");
char * x = "string2";
strcat(a,x);
printf("\n%s",a);
Here, I allocated only 10B to a, but still after concatenating a and x (combined size is 16B), C prints the answer without any problem.
But if I do this:
char * a = "string1";
char * x = "string2";
strcat(a,x);
printf("\n%s",a);
Then I get a segfault. Why is this? Why does the first one work despite lower memory allocation? Does strcat reallocate memory for me? If yes, why does the second one not work? Is it because a & x declared that way are unmodifiable string literals?
In your first example, a is allocated in the heap. So when you're concatenating the other string, something in the heap will be overwritten, but there is no write-protection.
In your second example, a points to a region of the memory that contains constants, and is readonly. Hence the seg fault.
The first one doesn't always work, it already caused an overflow. The second one, a is a pointer to the constant string which is stored in the data section, in a read-only page.
In the 2nd case what you have is a pointer to unmodifiable string literals,
In 1st case, you are printing out a heap memory location and in that case its undefined, you cannot guarantee that it will work every time.
(may be write it in a very large loop, yo may see this undefined behavior)
Your code is writing beyond the buffer that it's permitted, which causes undefined behavior. This can work and it can fail, and worse: it can look like it worked but cause seemingly unrelated failures later. The language allows you to do things like this because you're supposed to know what you're doing, but it's not recommended practice.
In your first case, of having used malloc to allocate buffers, you're actually being helped but not in a manner you should ever rely on. The malloc function allocates at least as much space as you've requested, but in practice it typically rounds up to a multiple of 16... so your malloc(10); probably got a 16 byte buffer. This is implementation specific and it's never a good idea to rely on something like that.
In your second case, it's likely that the memory pointed to by your a (and x) variable(s) is non-writable, which is why you've encountered a segfault.
This may be a very basic question for some. I was trying to understand how strcpy works actually behind the scenes. for example, in this code
#include <stdio.h>
#include <string.h>
int main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", sizeof(s));
return 0;
}
As I am declaring s to be a static array with size less than that of source. I thought it wont print the whole word, but it did print world isnsadsdas .. So, I thought that this strcpy function might be allocating new size if destination is less than the source. But now, when I check sizeof(s), it is still 6, but it is printing out more than that. Hows that working actually?
You've just caused undefined behaviour, so anything can happen. In your case, you're getting lucky and it's not crashing, but you shouldn't rely on that happening. Here's a simplified strcpy implementation (but it's not too far off from many real ones):
char *strcpy(char *d, const char *s)
{
char *saved = d;
while (*s)
{
*d++ = *s++;
}
*d = 0;
return saved;
}
sizeof is just returning you the size of your array from compile time. If you use strlen, I think you'll see what you expect. But as I mentioned above, relying on undefined behaviour is a bad idea.
http://natashenka.ca/wp-content/uploads/2014/01/strcpy8x11.png
strcpy is considered dangerous for reasons like the one you are demonstrating. The two buffers you created are local variables stored in the stack frame of the function. Here is roughly what the stack frame looks like:
http://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Call_stack_layout.svg/342px-Call_stack_layout.svg.png
FYI things are put on top of the stack meaning it grows backwards through memory (This does not mean the variables in memory are read backwards, just that newer ones are put 'behind' older ones). So that means if you write far enough into the locals section of your function's stack frame, you will write forward over every other stack variable after the variable you are copying to and break into other sections, and eventually overwrite the return pointer. The result is that if you are clever, you have full control of where the function returns. You could make it do anything really, but it isn't YOU that is the concern.
As you seem to know by making your first buffer 6 chars long for a 5 character string, C strings end in a null byte \x00. The strcpy function copies bytes until the source byte is 0, but it does not check that the destination is that long, which is why it can copy over the boundary of the array. This is also why your print is reading the buffer past its size, it reads till \x00. Interestingly, the strcpy may have written into the data of s depending on the order the compiler gave it in the stack, so a fun exercise could be to also print a and see if you get something like 'snsadsdas', but I can't be sure what it would look like even if it is polluting s because there are sometimes bytes in between the stack entries for various reasons).
If this buffer holds say, a password to check in code with a hashing function, and you copy it to a buffer in the stack from wherever you get it (a network packet if a server, or a text box, etc) you very well may copy more data from the source than the destination buffer can hold and give return control of your program to whatever user was able to send a packet to you or try a password. They just have to type the right number of characters, and then the correct characters that represent an address to somewhere in ram to jump to.
You can use strcpy if you check the bounds and maybe trim the source string, but it is considered bad practice. There are more modern functions that take a max length like http://www.cplusplus.com/reference/cstring/strncpy/
Oh and lastly, this is all called a buffer overflow. Some compilers add a nice little blob of bytes randomly chosen by the OS before and after every stack entry. After every copy the OS checks these bytes against its copy and terminates the program if they differ. This solves a lot of security problems, but it is still possible to copy bytes far enough into the stack to overwrite the pointer to the function to handle what happens when those bytes have been changed thus letting you do the same thing. It just becomes a lot harder to do right.
In C there is no bounds checking of arrays, its a trade off in order to have better performance at the risk of shooting yourself in the foot.
strcpy() doesn't care whether the target buffer is big enough so copying too many bytes will cause undefined behavior.
that is one of the reasons that a new version of strcpy were introduced where you can specify the target buffer size strcpy_s()
Note that sizeof(s) is determined at run time. Use strlen() to find the number of characters s occupied. When you perform strcpy() source string will be replaced by destination string so your output wont be "Helloworld isnsadsdas"
#include <stdio.h>
#include <string.h>
main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", strlen(s));
}
You are relying on undefined behaviour in as much as that the compiler has chose to place the two arrays where your code happens to work. This may not work in future.
As to the sizeof operator, this is figured out at compile time.
Once you use adequate array sizes you need to use strlen to fetch the length of the strings.
The best way to understand how strcpy works behind the scene is...reading its source code!
You can read the source for GLibC : http://fossies.org/dox/glibc-2.17/strcpy_8c_source.html . I hope it helps!
At the end of every string/character array there is a null terminator character '\0' which marks the end of the string/character array.
strcpy() preforms its task until it sees the '\0' character.
printf() also preforms its task until it sees the '\0' character.
sizeof() on the other hand is not interested in the content of the array, only its allocated size (how big it is supposed to be), thus not taking into consideration where the string/character array actually ends (how big it actually is).
As opposed to sizeof(), there is strlen() that is interested in how long the string actually is (not how long it was supposed to be) and thus counts the number of characters until it reaches the end ('\0' character) where it stops (it doesn't include the '\0' character).
Better Solution is
char *strcpy(char *p,char const *q)
{
char *saved=p;
while(*p++=*q++);
return saved;
}
Sorry if this is a bit of a starter question but I am pretty new to C. I am using the GCC complier. When I write a program with a string in, if the string is beyond a certain length it appears to start with some contents. I am worried about just overwritting it as it could be being used by another program. Here is an example code that shows the issue:
#include <stdio.h>
// Using the GCC Compiler
// Why is there already something in MyString?
int main(void) {
char MyString[250];
printf("%s", MyString);
getch();
return 0;
}
How do I SAFELY avoid this issue? Thanks for your help.
Why is there already something in MyString?
myString is not initiailized and can contain anything.
To initialize to an empty string:
char MyString[250] = { 0 };
or as pointed out by unwind in his answer:
char MyString[250] = "";
which is more readable (and consistent with the following).
To initialize to a string:
char myString[250] = "some-string";
I am worried about just overwritting it as it could be being used by another program
Each running instance of your program will have its own myString.
For some reason many are recommending the array-style initialization of
char myString[50] = { 0 };
however, since this array is intended to be used as a string, I find it far clearer and more intuitive (and simpler syntactically) to use a string initializer:
char myString[50] = "";
This does exactly the same thing, but makes it quite a lot clearer that what you intend to initialize the array as is in fact an empty string.
The situation you're seeing with "random" data is just what happens to be in the array, since you are not initializing it you simply get what happens to be there. This does not mean that the memory is being used by some other program at the same time, so you don't need to worry about that. You do need to worry about handing a pointer to an array of char that is not properly 0-terminated to any C function expecting a string, though.
Technically you are then invoking undefined behavior, which is something you should avoid. It can easily crash your program, since there's no telling how far away into memory you might end up. Operating systems are free to kill processes that try to access memory that they're not allowed to touch.
Properly initializing the array to an empty string avoids this issue.
The problem is that your string is not initialized.
A C-String ends with ends with '\0', so you should simply put something like
MyString[0] = '\0';
behind your declaration. This way you make sure that functions like printf work the way you expect them to work.
char MyString[250] = {0};
but for good use
std::string
Since you have initialzed the char array to any value, it'll contain some garbage value. It's a good programming practice to use something like:
char MyString[250] = "My Array"; // If you know the array to be used
char MyString[250] = '\0'; // If you don't intend to fill the char array data during initialization
I'm trying to write a simple C program on Ubuntu using Eclipse CDT (yes, I'm more comfortable with an IDE and I'm used to Eclipse from Java development), and I'm stuck with something weird. On one part of my code, I initialize a char array in a function, and it is by default pointing to the same location with one of the inputs, which has nothing to do with that char array. Here is my code:
char* subdir(const char input[], const char dir[]){
[*] int totallen = strlen(input) + strlen(dir) + 2;
char retval[totallen];
strcpy(retval, input);
strcat(retval, dir);
...}
Ok at the part I've marked with [*], there is a checkpoint. Even at that breakpoint, when I check y locals, I see that retval is pointing to the same address with my argument input. It not even possible as input comes from another function and retval is created in this function. Is is me being unexperienced with C and missing something, or is there a bug somewhere with the C compiler?
It seems so obvious to me that they should't point to the same (and a valid, of course, they aren't NULL) location. When the code goes on, it literally messes up everything; I get random characters and shapes in console and the program crashes.
I don't think it makes sense to check the address of retval BEFORE it appears, it being a VLA and all (by definition the compiler and the debugger don't know much about it, it's generated at runtime on the stack).
Try checking its address after its point of definition.
EDIT
I just read the "I get random characters and shapes in console". It's obvious now that you are returning the VLA and expecting things to work.
A VLA is only valid inside the block where it was defined. Using it outside is undefined behavior and thus very dangerous. Even if the size were constant, it still wouldn't be valid to return it from the function. In this case you most definitely want to malloc the memory.
What cnicutar said.
I hate people who do this, so I hate me ... but ... Arrays of non-const size are a C99 extension and not supported by C++. Of course GCC has extensions to make it happen.
Under the covers you are essentially doing an _alloca, so your odds of blowing out the stack are proportional to who has access to abuse the function.
Finally, I hope it doesn't actually get returned, because that would be returning a pointer to a stack allocated array, which would be your real problem since that array is gone as of the point of return.
In C++ you would typically use a string class.
In C you would either pass a pointer and length in as parameters, or a pointer to a pointer (or return a pointer) and specify the calls should call free() on it when done. These solutions all suck because they are error prone to leaks or truncation or overflow. :/
Well, your fundamental problem is that you are returning a pointer to the stack allocated VLA. You can't do that. Pointers to local variables are only valid inside the scope of the function that declares them. Your code results in Undefined Behaviour.
At least I am assuming that somewhere in the ..... in the real code is the line return retval.
You'll need to use heap allocation, or pass a suitably sized buffer to the function.
As well as that, you only need +1 rather than +2 in the length calculation - there is only one null-terminator.
Try changing retval to a character pointer and allocating your buffer using malloc().
Pass the two string arguments as, char * or const char *
Rather than returning char *, you should just pass another parameter with a string pointer that you already malloc'd space for.
Return bool or int describing what happened in the function, and use the parameter you passed to store the result.
Lastly don't forget to free the memory since you're having to malloc space for the string on the heap...
//retstr is not a const like the other two
bool subdir(const char *input, const char *dir,char *retstr){
strcpy(retstr, input);
strcat(retstr, dir);
return 1;
}
int main()
{
char h[]="Hello ";
char w[]="World!";
char *greet=(char*)malloc(strlen(h)+strlen(w)+1); //Size of the result plus room for the terminator!
subdir(h,w,greet);
printf("%s",greet);
return 1;
}
This will print: "Hello World!" added together by your function.
Also when you're creating a string on the fly you must malloc. The compiler doesn't know how long the two other strings are going to be, thus using char greet[totallen]; shouldn't work.
I have been trying to look for a reason why the following code is failing, and I couldn't find one.
So please, excuse my ignorance and let me know what's happening here.
#include<stdio.h>
int main(void){
char* p="Hi, this is not going to work";
scanf("%s",p);
return 0;
}
As far as I understood, I created a pointer p to a contiguous area in the memory of the size 29 + 1(for the \0).
Why can't I use scanf to change the contents of that?
P.S Please correct me If I said something wrong about char*.
char* p="Hi, this is not going to work";
this does not allocate memory for you to write
this creates a String Literal which results inUndefined Behaviour every time you try to change its contents.
to use p as a buffer for your scanf do something like
char * p = malloc(sizeof(char) * 128); // 128 is an Example
OR
you could as well do:
char p[]="Hi, this is not going to work";
Which I guess is what you really wanted to do.
Keep in mind that this can still end up being UB because scanf() does not check whether the place you are using is indeed valid writable memory.
remember :
char * p is a String Literal and should not be modified
char p[] = "..." allocates enough memory to hold the String inside the "..." and may be changed (its contents I mean).
Edit :
A nice trick to avoid UB is
char * p = malloc(sizeof(char) * 128);
scanf("%126s",s);
p points to a constant literal, which may in fact reside in a read-only memory area (implementation dependent). At any rate, trying to overwrite that is undefined behaviour. I.e. it might result in nothing, or an immediate crash, or a hidden memory corruption which causes mysterious problems much later. Don't ever do that.
It is crashing because memory has not been allocated for p. Allocate memory for p and it should be ok. What you have is a constant memory area pointing to by p. When you attempt to write something in this data segment, the runtime environment will raise a trap which will lead to a crash.
Hope this answers your question
scanf() parses data entered from stdin (normally, the keyboard). I think you want sscanf().
However, the purpose of scanf() is to part a string with predefined escape sequences, which your test string doesn't have. So that makes it a little unclear exactly what you are trying to do.
Note that sscanf() takes an additional argument as the first argument, which specifies the string being parsed.