String starting state in C - c

Sorry if this is a bit of a starter question but I am pretty new to C. I am using the GCC complier. When I write a program with a string in, if the string is beyond a certain length it appears to start with some contents. I am worried about just overwritting it as it could be being used by another program. Here is an example code that shows the issue:
#include <stdio.h>
// Using the GCC Compiler
// Why is there already something in MyString?
int main(void) {
char MyString[250];
printf("%s", MyString);
getch();
return 0;
}
How do I SAFELY avoid this issue? Thanks for your help.

Why is there already something in MyString?
myString is not initiailized and can contain anything.
To initialize to an empty string:
char MyString[250] = { 0 };
or as pointed out by unwind in his answer:
char MyString[250] = "";
which is more readable (and consistent with the following).
To initialize to a string:
char myString[250] = "some-string";
I am worried about just overwritting it as it could be being used by another program
Each running instance of your program will have its own myString.

For some reason many are recommending the array-style initialization of
char myString[50] = { 0 };
however, since this array is intended to be used as a string, I find it far clearer and more intuitive (and simpler syntactically) to use a string initializer:
char myString[50] = "";
This does exactly the same thing, but makes it quite a lot clearer that what you intend to initialize the array as is in fact an empty string.
The situation you're seeing with "random" data is just what happens to be in the array, since you are not initializing it you simply get what happens to be there. This does not mean that the memory is being used by some other program at the same time, so you don't need to worry about that. You do need to worry about handing a pointer to an array of char that is not properly 0-terminated to any C function expecting a string, though.
Technically you are then invoking undefined behavior, which is something you should avoid. It can easily crash your program, since there's no telling how far away into memory you might end up. Operating systems are free to kill processes that try to access memory that they're not allowed to touch.
Properly initializing the array to an empty string avoids this issue.

The problem is that your string is not initialized.
A C-String ends with ends with '\0', so you should simply put something like
MyString[0] = '\0';
behind your declaration. This way you make sure that functions like printf work the way you expect them to work.

char MyString[250] = {0};
but for good use
std::string

Since you have initialzed the char array to any value, it'll contain some garbage value. It's a good programming practice to use something like:
char MyString[250] = "My Array"; // If you know the array to be used
char MyString[250] = '\0'; // If you don't intend to fill the char array data during initialization

Related

arrays does not null from beginning

I'm a beginner in C .... I have a little code:
#include <stdio.h>
#include <string.h>
int main(){
char str1[100];
char str2[100];
char str3[100];
char str4[100];
puts(str1)
puts(str2);
puts(str3);
puts(str4);
return 0;
}
I got result
2
èý(
‘Q]wØ„ÃîþÿÿÿÀ"bwd&bw
I don't know why my array does not empty from the begin. And I have to set first element to "\0" to clear content of array. Can anyone explain for me. Thank a lot.
In C, local variables are not initialized automatically if you don't assign values to them. Here your arrays are uninitialized, which means they may contain garbage after their creation.
Yes, you need to explicitly set it to be "empty" like:
char str[100];
str[0] = '\0';
// Now you have an empty string of zero length.
assert(strlen(str) == 0);
// But the size is still 100.
printf ("%d", sizeof(str));
Alternatively, you can create an empty string(character array) during the initialization. It has the same size and length as the example above.
char str[100] = "";
As for why it doesn't automatically zero the string, it's because that would be costly to do so, and C generally doesn't do costly things that you don't explicitly tell it to do. At a minimum, it would have to set the first element of every array to zero,and there are plenty of occasions where you wouldn't want to or need to initialize the array like this. If C always did this for you, then you'd always have that useless overhead that you couldn't get rid of.
As a general rule, C doesn't do anything in the background that you don't explicitly tell it to do, so when you ask for an array, it just gives you an array, and doesn't touch the contents unless you tell it to. It can create a little bit more work for the programmer, but with the benefit of more finely-grained control over exactly what the computer is doing.
Some people would consider that it's a good programming practice to always initialize your variables anyway, and to forget about this kind of tiny cost, and a lot of the time they'll have a good point, but C is deliberately a very flexible and low-level language, and it just doesn't force you to do things like this.
one is getting old when one says "In my days...". But nevertheless, "in my days", people were instructed to first declare variables, and directly afterward initialise variables.
In your case, you can do both together and even more thoroughly in one statement.
The solution of Eric Z is the correct one, that I would also use when I'm working the C-way. But to be complete for you, what age_pan describes is that Java inherently does te following:
#include <stdio.h>
int main(int argc, const char * argv[])
{
char str1[100] = { 0 };
char str2[100] = { 0 };
char str3[100] = { 0 };
char str4[100] = { 0 };
puts(str1);
puts(str2);
puts(str3);
puts(str4);
return 0;
}
The difference is that in the solution of Eric Z only the first character is set to 0, which means that you create a zero length zero terminated string. The Java method (shown in the code above) initialises every little byte to 0.
There are pro's and con's to the Java initialisation. It leads to sloppy programming (some call it easier programming) and it takes time if you don't need initialising. On the other hand, I know very little people that need te extra milliseconds that are lost by the initialisation.
Is it necessary to declare variables above the code, and to initialise them? Certainly not. Is it useful? It most certainly is. It avoids all kinds of errors that take a lot of time to debug.
By the way, you are missing a ; after puts(str1) :-)
Kind regards,
PB
I don't think you had any trouble if the array doesn't start with "empty". In C, the variables start with random values. Unlike in Java, when you declare a variable, the JVM will initiate it by default.

C char* pointers pointing to same location where they definitely shouldn't

I'm trying to write a simple C program on Ubuntu using Eclipse CDT (yes, I'm more comfortable with an IDE and I'm used to Eclipse from Java development), and I'm stuck with something weird. On one part of my code, I initialize a char array in a function, and it is by default pointing to the same location with one of the inputs, which has nothing to do with that char array. Here is my code:
char* subdir(const char input[], const char dir[]){
[*] int totallen = strlen(input) + strlen(dir) + 2;
char retval[totallen];
strcpy(retval, input);
strcat(retval, dir);
...}
Ok at the part I've marked with [*], there is a checkpoint. Even at that breakpoint, when I check y locals, I see that retval is pointing to the same address with my argument input. It not even possible as input comes from another function and retval is created in this function. Is is me being unexperienced with C and missing something, or is there a bug somewhere with the C compiler?
It seems so obvious to me that they should't point to the same (and a valid, of course, they aren't NULL) location. When the code goes on, it literally messes up everything; I get random characters and shapes in console and the program crashes.
I don't think it makes sense to check the address of retval BEFORE it appears, it being a VLA and all (by definition the compiler and the debugger don't know much about it, it's generated at runtime on the stack).
Try checking its address after its point of definition.
EDIT
I just read the "I get random characters and shapes in console". It's obvious now that you are returning the VLA and expecting things to work.
A VLA is only valid inside the block where it was defined. Using it outside is undefined behavior and thus very dangerous. Even if the size were constant, it still wouldn't be valid to return it from the function. In this case you most definitely want to malloc the memory.
What cnicutar said.
I hate people who do this, so I hate me ... but ... Arrays of non-const size are a C99 extension and not supported by C++. Of course GCC has extensions to make it happen.
Under the covers you are essentially doing an _alloca, so your odds of blowing out the stack are proportional to who has access to abuse the function.
Finally, I hope it doesn't actually get returned, because that would be returning a pointer to a stack allocated array, which would be your real problem since that array is gone as of the point of return.
In C++ you would typically use a string class.
In C you would either pass a pointer and length in as parameters, or a pointer to a pointer (or return a pointer) and specify the calls should call free() on it when done. These solutions all suck because they are error prone to leaks or truncation or overflow. :/
Well, your fundamental problem is that you are returning a pointer to the stack allocated VLA. You can't do that. Pointers to local variables are only valid inside the scope of the function that declares them. Your code results in Undefined Behaviour.
At least I am assuming that somewhere in the ..... in the real code is the line return retval.
You'll need to use heap allocation, or pass a suitably sized buffer to the function.
As well as that, you only need +1 rather than +2 in the length calculation - there is only one null-terminator.
Try changing retval to a character pointer and allocating your buffer using malloc().
Pass the two string arguments as, char * or const char *
Rather than returning char *, you should just pass another parameter with a string pointer that you already malloc'd space for.
Return bool or int describing what happened in the function, and use the parameter you passed to store the result.
Lastly don't forget to free the memory since you're having to malloc space for the string on the heap...
//retstr is not a const like the other two
bool subdir(const char *input, const char *dir,char *retstr){
strcpy(retstr, input);
strcat(retstr, dir);
return 1;
}
int main()
{
char h[]="Hello ";
char w[]="World!";
char *greet=(char*)malloc(strlen(h)+strlen(w)+1); //Size of the result plus room for the terminator!
subdir(h,w,greet);
printf("%s",greet);
return 1;
}
This will print: "Hello World!" added together by your function.
Also when you're creating a string on the fly you must malloc. The compiler doesn't know how long the two other strings are going to be, thus using char greet[totallen]; shouldn't work.

What is the best way to empty a char string in C?

Hi I have a char string
name[50] = "I love programming"
what happen is that I want to empty this string before I call my another function so that I can store something in the same array
will this work?
name[0] = '\0';
or is there anyway to empty the string without creating any new function or use any other library?
Setting first char to nul is perfectly acceptable. But if that string was sensitive in terms of security, then you should zero it out with memset.
Edit:
Answer from Matteo Italia made me dig a bit deeper on this subject. According to this document (and Matteos answer) memset could be optimized away, and so is not the best option to remove sensitive information from memory. The document has several options, but none of them is portable and reliable, so it proposes new function standard memset_s just for such purposes. This function does not exist yet, so we're currently stuck with non-portable (SecureZeroMemory), non-reliable (volatile trick), or non-optimal options (secure_memset example).
There's really no concept of emptying a char string in C. It's simply a pointer that points to some allocated memory. You can reuse that memory in any way you wish, without "emptying" it first. So, the easiest way to empty it is "just don't".
If you want to explicitly clear all contents of the string for some reason, use the memset approach given in other answers.
If you want to "empty" it in the sense that when it's printed, nothing will be printed, then yes, just set the first char to `\0'.
To conclude, it all depends on what you really want to do. Why do you want to "empty" the string?
Use memset instead. This would just nullify the buffer but the memory allocated would any how gets deallocated from stack when the variable goes out of scope.
memset (name,'\0',sizeof(name));
IIRC, you might use memset this way:
char * myString;
...
size_t len = strlen(myString)
memset (myString, 0,len);
Tehnically it is correct, for example:
char array[10] = "hello";
printf("%d\r\n", strlen(array)); // prints 5
array[0] = '\0';
printf("%d\r\n", strlen(array)); // prints 0
memset(name, 0, 50);
or
bzero(name, 50);
It depends from the effect you want to obtain. If you just want to zero its length you can do, as you said:
*name='\0';
If, instead, you want to clean your string from sensitive data, you should zero it completely with memset (some operating systems also have a "secure" zeroing function that should be guaranteed not to be optimized away by the compiler - see e.g. SecureZeroMemory on Windows).
On the other hand, if the function you are calling just uses the buffer you are passing as an output buffer disregarding its content, you may just leave the buffer as it is.

Initialization strings in C

I have a question about how is the correct way of manipulate the initialization of c strings
For example the next code, isn't always correct.
char *something;
something = "zzzzzzzzzzzzzzzzzz";
i test a little incrementing the number of zetas and effectively the program crash in like about two lines, so what is the real size limit in this char array? how can i be sure that it is not going to crash, is this limit implementation dependent? Is the following code the correct approach that i always must use?
char something[FIXEDSIZE];
strcpy(something, "zzzzzzzzzzzzzzzzzzz");
As you say, manipulating this string leads to undefined behaviour:
char *something;
something = "zzzzzzzzzzzzzzzzzz";
If you are curious as to why, see "C String literals: Where do they go?".
If you plan to manipulate your string at all, (i.e. if you want it to be mutable) you should use this:
char something[] = "skjdghskfjhgfsj";
Otherwise, simply declare your char * as a const char * to indicate that it points to a constant.
In the second example, the compiler will be smart enough to declare this as an array on the stack of the exact size to hold the string. Thus, the size of this is limited by your stack.
Of course, you will likely want to specify the size anyway, since it is usually useful to know when manipulating the string.
The second is always correct.
The first is correct only if you never change the string, since you've assigned a pointer to fixed data.
The first example is only incorrect in that char *something should really be const char *something. Otherwise, this:
const char *something = "fooooooooooooooooooooooobar";
...should work, and should not crash.
char something[FIXEDSIZE];
...this one, however, can typically crash with a stack overflow if you, well, overflow the stack, which depends on how big that stack is, how big that array is, where this gets called, etc.
first should never crash. second will crash as soon as the number of 'z' + 1 go over the available space on the stack page, or if you try to return from the function.

C's strtok() and read only string literals

char *strtok(char *s1, const char *s2)
repeated calls to this function break string s1 into "tokens"--that is
the string is broken into substrings,
each terminating with a '\0', where
the '\0' replaces any characters
contained in string s2. The first call
uses the string to be tokenized as s1;
subsequent calls use NULL as the first
argument. A pointer to the beginning
of the current token is returned; NULL
is returned if there are no more
tokens.
Hi,
I have been trying to use strtok just now and found out that if I pass in a char* into s1, I get a segmentation fault. If I pass in a char[], strtok works fine.
Why is this?
I googled around and the reason seems to be something about how char* is read only and char[] is writeable. A more thorough explanation would be much appreciated.
What did you initialize the char * to?
If something like
char *text = "foobar";
then you have a pointer to some read-only characters
For
char text[7] = "foobar";
then you have a seven element array of characters that you can do what you like with.
strtok writes into the string you give it - overwriting the separator character with null and keeping a pointer to the rest of the string.
Hence, if you pass it a read-only string, it will attempt to write to it, and you get a segfault.
Also, becasue strtok keeps a reference to the rest of the string, it's not reeentrant - you can use it only on one string at a time. It's best avoided, really - consider strsep(3) instead - see, for example, here: http://www.rt.com/man/strsep.3.html (although that still writes into the string so has the same read-only/segfault issue)
An important point that's inferred but not stated explicitly:
Based on your question, I'm guessing that you're fairly new to programming in C, so I'd like to explain a little more about your situation. Forgive me if I'm mistaken; C can be hard to learn mostly because of subtle misunderstanding in underlying mechanisms so I like to make things as plain as possible.
As you know, when you write out your C program the compiler pre-creates everything for you based on the syntax. When you declare a variable anywhere in your code, e.g.:
int x = 0;
The compiler reads this line of text and says to itself: OK, I need to replace all occurrences in the current code scope of x with a constant reference to a region of memory I've allocated to hold an integer.
When your program is run, this line leads to a new action: I need to set the region of memory that x references to int value 0.
Note the subtle difference here: the memory location that reference point x holds is constant (and cannot be changed). However, the value that x points can be changed. You do it in your code through assignment, e.g. x = 15;. Also note that the single line of code actually amounts to two separate commands to the compiler.
When you have a statement like:
char *name = "Tom";
The compiler's process is like this: OK, I need to replace all occurrences in the current code scope of name with a constant reference to a region of memory I've allocated to hold a char pointer value. And it does so.
But there's that second step, which amounts to this: I need to create a constant array of characters which holds the values 'T', 'o', 'm', and NULL. Then I need to replace the part of the code which says "Tom" with the memory address of that constant string.
When your program is run, the final step occurs: setting the pointer to char's value (which isn't constant) to the memory address of that automatically created string (which is constant).
So a char * is not read-only. Only a const char * is read-only. But your problem in this case isn't that char *s are read-only, it's that your pointer references a read-only regions of memory.
I bring all this up because understanding this issue is the barrier between you looking at the definition of that function from the library and understanding the issue yourself versus having to ask us. And I've somewhat simplified some of the details in the hopes of making the issue more understandable.
I hope this was helpful. ;)
I blame the C standard.
char *s = "abc";
could have been defined to give the same error as
const char *cs = "abc";
char *s = cs;
on grounds that string literals are unmodifiable. But it wasn't, it was defined to compile. Go figure. [Edit: Mike B has gone figured - "const" didn't exist at all in K&R C. ISO C, plus every version of C and C++ since, has wanted to be backward-compatible. So it has to be valid.]
If it had been defined to give an error, then you couldn't have got as far as the segfault, because strtok's first parameter is char*, so the compiler would have prevented you passing in the pointer generated from the literal.
It may be of interest that there was at one time a plan in C++ for this to be deprecated (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/1996/N0896.asc). But 12 years later I can't persuade either gcc or g++ to give me any kind of warning for assigning a literal to non-const char*, so it isn't all that loudly deprecated.
[Edit: aha: -Wwrite-strings, which isn't included in -Wall or -Wextra]
In brief:
char *s = "HAPPY DAY";
printf("\n %s ", s);
s = "NEW YEAR"; /* Valid */
printf("\n %s ", s);
s[0] = 'c'; /* Invalid */
If you look at your compiler documentation, odds are there is a option you can set to make those strings writable.

Resources