What is the best way to empty a char string in C? - c

Hi I have a char string
name[50] = "I love programming"
what happen is that I want to empty this string before I call my another function so that I can store something in the same array
will this work?
name[0] = '\0';
or is there anyway to empty the string without creating any new function or use any other library?

Setting first char to nul is perfectly acceptable. But if that string was sensitive in terms of security, then you should zero it out with memset.
Edit:
Answer from Matteo Italia made me dig a bit deeper on this subject. According to this document (and Matteos answer) memset could be optimized away, and so is not the best option to remove sensitive information from memory. The document has several options, but none of them is portable and reliable, so it proposes new function standard memset_s just for such purposes. This function does not exist yet, so we're currently stuck with non-portable (SecureZeroMemory), non-reliable (volatile trick), or non-optimal options (secure_memset example).

There's really no concept of emptying a char string in C. It's simply a pointer that points to some allocated memory. You can reuse that memory in any way you wish, without "emptying" it first. So, the easiest way to empty it is "just don't".
If you want to explicitly clear all contents of the string for some reason, use the memset approach given in other answers.
If you want to "empty" it in the sense that when it's printed, nothing will be printed, then yes, just set the first char to `\0'.
To conclude, it all depends on what you really want to do. Why do you want to "empty" the string?

Use memset instead. This would just nullify the buffer but the memory allocated would any how gets deallocated from stack when the variable goes out of scope.
memset (name,'\0',sizeof(name));

IIRC, you might use memset this way:
char * myString;
...
size_t len = strlen(myString)
memset (myString, 0,len);

Tehnically it is correct, for example:
char array[10] = "hello";
printf("%d\r\n", strlen(array)); // prints 5
array[0] = '\0';
printf("%d\r\n", strlen(array)); // prints 0

memset(name, 0, 50);
or
bzero(name, 50);

It depends from the effect you want to obtain. If you just want to zero its length you can do, as you said:
*name='\0';
If, instead, you want to clean your string from sensitive data, you should zero it completely with memset (some operating systems also have a "secure" zeroing function that should be guaranteed not to be optimized away by the compiler - see e.g. SecureZeroMemory on Windows).
On the other hand, if the function you are calling just uses the buffer you are passing as an output buffer disregarding its content, you may just leave the buffer as it is.

Related

How strcpy works behind the scenes?

This may be a very basic question for some. I was trying to understand how strcpy works actually behind the scenes. for example, in this code
#include <stdio.h>
#include <string.h>
int main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", sizeof(s));
return 0;
}
As I am declaring s to be a static array with size less than that of source. I thought it wont print the whole word, but it did print world isnsadsdas .. So, I thought that this strcpy function might be allocating new size if destination is less than the source. But now, when I check sizeof(s), it is still 6, but it is printing out more than that. Hows that working actually?
You've just caused undefined behaviour, so anything can happen. In your case, you're getting lucky and it's not crashing, but you shouldn't rely on that happening. Here's a simplified strcpy implementation (but it's not too far off from many real ones):
char *strcpy(char *d, const char *s)
{
char *saved = d;
while (*s)
{
*d++ = *s++;
}
*d = 0;
return saved;
}
sizeof is just returning you the size of your array from compile time. If you use strlen, I think you'll see what you expect. But as I mentioned above, relying on undefined behaviour is a bad idea.
http://natashenka.ca/wp-content/uploads/2014/01/strcpy8x11.png
strcpy is considered dangerous for reasons like the one you are demonstrating. The two buffers you created are local variables stored in the stack frame of the function. Here is roughly what the stack frame looks like:
http://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Call_stack_layout.svg/342px-Call_stack_layout.svg.png
FYI things are put on top of the stack meaning it grows backwards through memory (This does not mean the variables in memory are read backwards, just that newer ones are put 'behind' older ones). So that means if you write far enough into the locals section of your function's stack frame, you will write forward over every other stack variable after the variable you are copying to and break into other sections, and eventually overwrite the return pointer. The result is that if you are clever, you have full control of where the function returns. You could make it do anything really, but it isn't YOU that is the concern.
As you seem to know by making your first buffer 6 chars long for a 5 character string, C strings end in a null byte \x00. The strcpy function copies bytes until the source byte is 0, but it does not check that the destination is that long, which is why it can copy over the boundary of the array. This is also why your print is reading the buffer past its size, it reads till \x00. Interestingly, the strcpy may have written into the data of s depending on the order the compiler gave it in the stack, so a fun exercise could be to also print a and see if you get something like 'snsadsdas', but I can't be sure what it would look like even if it is polluting s because there are sometimes bytes in between the stack entries for various reasons).
If this buffer holds say, a password to check in code with a hashing function, and you copy it to a buffer in the stack from wherever you get it (a network packet if a server, or a text box, etc) you very well may copy more data from the source than the destination buffer can hold and give return control of your program to whatever user was able to send a packet to you or try a password. They just have to type the right number of characters, and then the correct characters that represent an address to somewhere in ram to jump to.
You can use strcpy if you check the bounds and maybe trim the source string, but it is considered bad practice. There are more modern functions that take a max length like http://www.cplusplus.com/reference/cstring/strncpy/
Oh and lastly, this is all called a buffer overflow. Some compilers add a nice little blob of bytes randomly chosen by the OS before and after every stack entry. After every copy the OS checks these bytes against its copy and terminates the program if they differ. This solves a lot of security problems, but it is still possible to copy bytes far enough into the stack to overwrite the pointer to the function to handle what happens when those bytes have been changed thus letting you do the same thing. It just becomes a lot harder to do right.
In C there is no bounds checking of arrays, its a trade off in order to have better performance at the risk of shooting yourself in the foot.
strcpy() doesn't care whether the target buffer is big enough so copying too many bytes will cause undefined behavior.
that is one of the reasons that a new version of strcpy were introduced where you can specify the target buffer size strcpy_s()
Note that sizeof(s) is determined at run time. Use strlen() to find the number of characters s occupied. When you perform strcpy() source string will be replaced by destination string so your output wont be "Helloworld isnsadsdas"
#include <stdio.h>
#include <string.h>
main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", strlen(s));
}
You are relying on undefined behaviour in as much as that the compiler has chose to place the two arrays where your code happens to work. This may not work in future.
As to the sizeof operator, this is figured out at compile time.
Once you use adequate array sizes you need to use strlen to fetch the length of the strings.
The best way to understand how strcpy works behind the scene is...reading its source code!
You can read the source for GLibC : http://fossies.org/dox/glibc-2.17/strcpy_8c_source.html . I hope it helps!
At the end of every string/character array there is a null terminator character '\0' which marks the end of the string/character array.
strcpy() preforms its task until it sees the '\0' character.
printf() also preforms its task until it sees the '\0' character.
sizeof() on the other hand is not interested in the content of the array, only its allocated size (how big it is supposed to be), thus not taking into consideration where the string/character array actually ends (how big it actually is).
As opposed to sizeof(), there is strlen() that is interested in how long the string actually is (not how long it was supposed to be) and thus counts the number of characters until it reaches the end ('\0' character) where it stops (it doesn't include the '\0' character).
Better Solution is
char *strcpy(char *p,char const *q)
{
char *saved=p;
while(*p++=*q++);
return saved;
}

Working with Pointers and Strcpy in C

I'm fairly new to the concept of pointers in C. Let's say I have two variables:
char *arch_file_name;
char *tmp_arch_file_name;
Now, I want to copy the value of arch_file_name to tmp_arch_file_name and add the word "tmp" to the end of it. I'm looking at them as strings, so I have:
strcpy(&tmp_arch_file_name, &arch_file_name);
strcat(tmp_arch_file_name, "tmp");
However, when strcat() is called, both of the variables change and are the same. I want one of them to change and the other to stay intact. I have to use pointers because I use the names later for the fopen(), rename() and delete() functions. How can I achieve this?
What you want is:
strcpy(tmp_arch_file_name, arch_file_name);
strcat(tmp_arch_file_name, "tmp");
You are just copying the pointers (and other random bits until you hit a 0 byte) in the original code, that's why they end up the same.
As shinkou correctly notes, make sure tmp_arch_file_name points to a buffer of sufficient size (it's not clear if you're doing this in your code). Simplest way is to do something like:
char buffer[256];
char* tmp_arch_file_name = buffer;
Before you use pointers, you need to allocate memory. Assuming that arch_file_name is assigned a value already, you should calculate the length of the result string, allocate memory, do strcpy, and then strcat, like this:
char *arch_file_name = "/temp/my.arch";
// Add lengths of the two strings together; add one for the \0 terminator:
char * tmp_arch_file_name = malloc((strlen(arch_file_name)+strlen("tmp")+1)*sizeof(char));
strcpy(tmp_arch_file_name, arch_file_name);
// ^ this and this ^ are pointers already; no ampersands!
strcat(tmp_arch_file_name, "tmp");
// use tmp_arch_file_name, and then...
free(tmp_arch_file_name);
First, you need to make sure those pointers actually point to valid memory. As they are, they're either NULL pointers or arbitrary values, neither of which will work very well:
char *arch_file_name = "somestring";
char tmp_arch_file_name[100]; // or malloc
Then you cpy and cat, but with the pointers, not pointers-to-the-pointers that you currently have:
strcpy (tmp_arch_file_name, arch_file_name); // i.e., no "&" chars
strcat (tmp_arch_file_name, "tmp");
Note that there is no bounds checking going on in this code - the sample doesn't need it since it's clear that all the strings will fit in the allocated buffers.
However, unless you totally control the data, a more robust solution would check sizes before blindly copying or appending. Since it's not directly related to the question, I won't add it in here, but it's something to be aware of.
The & operator is the address-of operator, that is it returns the address of a variable. However using it on a pointer returns the address of where the pointer is stored, not what it points to.

String starting state in C

Sorry if this is a bit of a starter question but I am pretty new to C. I am using the GCC complier. When I write a program with a string in, if the string is beyond a certain length it appears to start with some contents. I am worried about just overwritting it as it could be being used by another program. Here is an example code that shows the issue:
#include <stdio.h>
// Using the GCC Compiler
// Why is there already something in MyString?
int main(void) {
char MyString[250];
printf("%s", MyString);
getch();
return 0;
}
How do I SAFELY avoid this issue? Thanks for your help.
Why is there already something in MyString?
myString is not initiailized and can contain anything.
To initialize to an empty string:
char MyString[250] = { 0 };
or as pointed out by unwind in his answer:
char MyString[250] = "";
which is more readable (and consistent with the following).
To initialize to a string:
char myString[250] = "some-string";
I am worried about just overwritting it as it could be being used by another program
Each running instance of your program will have its own myString.
For some reason many are recommending the array-style initialization of
char myString[50] = { 0 };
however, since this array is intended to be used as a string, I find it far clearer and more intuitive (and simpler syntactically) to use a string initializer:
char myString[50] = "";
This does exactly the same thing, but makes it quite a lot clearer that what you intend to initialize the array as is in fact an empty string.
The situation you're seeing with "random" data is just what happens to be in the array, since you are not initializing it you simply get what happens to be there. This does not mean that the memory is being used by some other program at the same time, so you don't need to worry about that. You do need to worry about handing a pointer to an array of char that is not properly 0-terminated to any C function expecting a string, though.
Technically you are then invoking undefined behavior, which is something you should avoid. It can easily crash your program, since there's no telling how far away into memory you might end up. Operating systems are free to kill processes that try to access memory that they're not allowed to touch.
Properly initializing the array to an empty string avoids this issue.
The problem is that your string is not initialized.
A C-String ends with ends with '\0', so you should simply put something like
MyString[0] = '\0';
behind your declaration. This way you make sure that functions like printf work the way you expect them to work.
char MyString[250] = {0};
but for good use
std::string
Since you have initialzed the char array to any value, it'll contain some garbage value. It's a good programming practice to use something like:
char MyString[250] = "My Array"; // If you know the array to be used
char MyString[250] = '\0'; // If you don't intend to fill the char array data during initialization

Strcpy() corrupts the copied string in Solaris but not Linux

I'm writing a C code for a class. This class requires that our code compile and run on the school server, which is a sparc solaris machine. I'm running Linux x64.
I have this line to parse (THIS IS NOT ACTUAL CODE BUT IS INPUT TO MY PROGRAM):
while ( cond1 ){
I need to capture the "while" and the "cond1" into separate strings. I've been using strtok() to do this. In Linux, the following lines:
char *cond = NULL;
cond = (char *)malloc(sizeof(char));
memset(cond, 0, sizeof(char));
strcpy(cond, strtok(NULL, ": \t\(){")); //already got the "while" out of the line
will correctly capture the string "cond1".Running this on the solaris machine, however, gives me the string "cone1".
Note that in plenty of other cases within my program, strings are being copied correctly. (For instance, the "while") was captured correctly.
Does anyone know what is going on here?
The line:
cond = (char *)malloc(sizeof(char));
allocates exactly one char for storage, into which you are then copying more than one - strcpy needs to put, at a bare minimum, the null terminator but, in your case, also the results of your strtok as well.
The reason it may work on a different system is that some implementations of malloc will allocate at a certain resolution (e.g., a multiple of 16 bytes) no matter what actual value you ask for, so you may have some free space there at the end of your buffer. But what you're attempting is still very much undefined behaviour.
The fact that the undefined behaviour may be to work sometimes in no way abrogates your responsibility to avoid such behaviour.
Allocate enough space for storing the results of your strtok and you should be okay.
The safest way to do this is to dynamically allocate the space so that it's at least as big as the string you're passing to strtok. That way there can be no possibility of overflow (other than weird edge cases where other threads may modify the data behind your back but, if that were the case, strtok would be a very bad choice anyway).
Something like (if instr is your original input string):
cond = (char*)malloc(strlen(instr)+1);
This guarantees that any token extracted from instr will fit within cond.
As an aside, sizeof(char) is always 1 by definition, so you don't need to multiply by it.
cond is being allocated one byte. strcpy is copying at least two bytes to that allocation. That is, you are writing more bytes into the allocation than there is room for.
One way to fix it to use char *cond = malloc (1000); instead of what you've got.
You only allocated memory for 1 character but you trying to store at least 6 characters (you need space for the terminating \0). The quick and dirty way to solve this is just say
char cond[128]
instead of malloc.

Disabling NUL-termination of strings in GCC

Is it possible to globally disable NUL-terminated strings in GCC?
I am using my own string library, and I have absolutely no need for the final NUL characters as it already stores the proper length internally in a struct.
However, if I wanted to append 10 strings, this would mean that 10 bytes are unnecessarily allocated on the stack. With wide strings it is even worse: As for x86, there are 40 bytes wasted; and for x86_64, 80 bytes!
I defined a macro to add those stack-allocated strings to my struct:
#define AppendString(ppDest, pSource) \
AppendSubString(ppDest, (*ppDest)->len + 1, pSource, 0, sizeof(pSource) - 1)
Using sizeof(...) - 1 works quite well but I am wondering whether I could get rid of NUL termination in order to save a few bytes?
This is pretty awful, but you can explicitly specify the length of every character array constant:
char my_constant[6] = "foobar";
assert(sizeof my_constant == 6);
wchar_t wide_constant[6] = L"foobar";
assert(sizeof wide_constant == 6*sizeof(wchar_t));
I understand you're only dealing with strings declared in your program:
....
char str1[10];
char str2[12];
....
and not with text buffers you allocate with malloc() and friends otherwise sizeof is not going to help you.
Anyway, i would just think twice about removing the \0 at the end: you would lose the compatibility with C standard library functions.
Unless you are going to rewrite any single string function for your library (sprintf, for example), are you sure you want to do it?
I can't remember the details, but when I do
char my_constant[5]
it is possible that it will reserve 8 bytes anyway, because some machines can't address the middle of a word.
It's nearly always best to leave this sort of thing to the compiler and let it handle the optmisation for you, unless there is a really really good reason to do so.
If you're not using any of the Standard Library function that deal with strings you can forget about the NUL terminating byte.
No strlen(), no fgets(), no atoi(), no strtoul(), no fopen(), no printf() with the %s conversion specifier ...
Declare your "not quite C strings" with just the needed space;
struct NotQuiteCString { /* ... */ };
struct NotQuiteCString variable;
variable.data = malloc(5);
data[0] = 'H'; /* ... */ data[4] = 'o'; /* "hello" */
Indeed this is only in case you are really low in memory. Otherwise I don't recommend to do so.
It seems most proper way to do thing you are talking about is:
To prepare some minimal 'listing' file in a form of:
string1_constant_name "str1"
string2_constant_name "str2"
...
To construct utility which processes your file and generates declarations such as
const char string1_constant[4] = "str1";
Of course I'd not recommend to do this by hands, because otherwise you can get in trouble after any string change.
So now you have both non-terminated strings because of fixed auto-generated arrays and also you have sizeof() for every variable. This solution seems acceptable.
Benefits are easy localization, possibility to add some level of checks to make this solution risk lower and R/O data segment savings.
Drawback is need to include all of such string constants in every module (as include to keep sizeof() known). So this only makes sense if your linker merges such symbols (some don't).
Aren't these similar to Pascal-style strings, or Hollerith Strings? I think this is only useful if you actually want the String data to preserve NULLs, in which you're really pushing around arbitrary memory, not "strings" per se.
The question uses false assumptions - it assumes that storing the length (e.g. implicitly by passing it as a number to a function) incurs no overhead, but that's not true.
While one might save space by not storing the 0-byte (or wchar), the size must be stored somewhere, and the example hints that it is passed as a constant argument to a function somewhere, which almost certainly takes more space, in code. If the same string is used multiple times, the overhead is per use, not per-string.
Having a wrapper that uses strlen to determine the length of a string and isn't inlined will almost certainly save more space.

Resources