Is it possible to globally disable NUL-terminated strings in GCC?
I am using my own string library, and I have absolutely no need for the final NUL characters as it already stores the proper length internally in a struct.
However, if I wanted to append 10 strings, this would mean that 10 bytes are unnecessarily allocated on the stack. With wide strings it is even worse: As for x86, there are 40 bytes wasted; and for x86_64, 80 bytes!
I defined a macro to add those stack-allocated strings to my struct:
#define AppendString(ppDest, pSource) \
AppendSubString(ppDest, (*ppDest)->len + 1, pSource, 0, sizeof(pSource) - 1)
Using sizeof(...) - 1 works quite well but I am wondering whether I could get rid of NUL termination in order to save a few bytes?
This is pretty awful, but you can explicitly specify the length of every character array constant:
char my_constant[6] = "foobar";
assert(sizeof my_constant == 6);
wchar_t wide_constant[6] = L"foobar";
assert(sizeof wide_constant == 6*sizeof(wchar_t));
I understand you're only dealing with strings declared in your program:
....
char str1[10];
char str2[12];
....
and not with text buffers you allocate with malloc() and friends otherwise sizeof is not going to help you.
Anyway, i would just think twice about removing the \0 at the end: you would lose the compatibility with C standard library functions.
Unless you are going to rewrite any single string function for your library (sprintf, for example), are you sure you want to do it?
I can't remember the details, but when I do
char my_constant[5]
it is possible that it will reserve 8 bytes anyway, because some machines can't address the middle of a word.
It's nearly always best to leave this sort of thing to the compiler and let it handle the optmisation for you, unless there is a really really good reason to do so.
If you're not using any of the Standard Library function that deal with strings you can forget about the NUL terminating byte.
No strlen(), no fgets(), no atoi(), no strtoul(), no fopen(), no printf() with the %s conversion specifier ...
Declare your "not quite C strings" with just the needed space;
struct NotQuiteCString { /* ... */ };
struct NotQuiteCString variable;
variable.data = malloc(5);
data[0] = 'H'; /* ... */ data[4] = 'o'; /* "hello" */
Indeed this is only in case you are really low in memory. Otherwise I don't recommend to do so.
It seems most proper way to do thing you are talking about is:
To prepare some minimal 'listing' file in a form of:
string1_constant_name "str1"
string2_constant_name "str2"
...
To construct utility which processes your file and generates declarations such as
const char string1_constant[4] = "str1";
Of course I'd not recommend to do this by hands, because otherwise you can get in trouble after any string change.
So now you have both non-terminated strings because of fixed auto-generated arrays and also you have sizeof() for every variable. This solution seems acceptable.
Benefits are easy localization, possibility to add some level of checks to make this solution risk lower and R/O data segment savings.
Drawback is need to include all of such string constants in every module (as include to keep sizeof() known). So this only makes sense if your linker merges such symbols (some don't).
Aren't these similar to Pascal-style strings, or Hollerith Strings? I think this is only useful if you actually want the String data to preserve NULLs, in which you're really pushing around arbitrary memory, not "strings" per se.
The question uses false assumptions - it assumes that storing the length (e.g. implicitly by passing it as a number to a function) incurs no overhead, but that's not true.
While one might save space by not storing the 0-byte (or wchar), the size must be stored somewhere, and the example hints that it is passed as a constant argument to a function somewhere, which almost certainly takes more space, in code. If the same string is used multiple times, the overhead is per use, not per-string.
Having a wrapper that uses strlen to determine the length of a string and isn't inlined will almost certainly save more space.
Related
Is it possible to have strings with NULL character somewhere except the end and work with them? Like get their size, use strcat, etc?
I have some ideas:
1) Write your own function for getting length (or something else), which is going to iterate over a string. If it meets a NULL char, it is going to check the next char of the string. If it is not NULL - continue counting chars. But it may (and WILL!) eventually lead to situation when you are reading memory OUTSIDE of the char array. So it is a bad idea.
2) Use sizeof(array)/sizeof(type), eg sizeof(input)/sizeof(char). That is going to work pretty good I think.
Do you have any other ideas on how this can be done? Maybe there are some function which I am not aware of (C newbie alert :))?
The only really safe method I can think of is to use "Pascal"-type strings (that is, something that has a string header and assorted other data associated with it).
Something like this:
typedef struct {
int len, allocated;
char *data;
} my_string;
You would then have to implement pretty much every string manipulation function yourself. Keeping both the "length of the string" and "the size of the allocation" allows you to have an allocation that's larger than the current contents, this may make repeated string concatenation cheaper (allows an amortized O(1) append).
You can have an array of char, either statically or dynamically allocated, that contains a zero byte in the middle, but only the part up to and including the zero can be considered a "string" in the standard C sense. Only that part will be recognized or considered by the standard library's string functions.
You can use a different terminator -- say two zeroes in a row -- and write your own string functions, but that just pushes off the problem. What happens when you need two zeroes in the middle of your string? In any case, you need to exercise even more care in this case than in the ordinary string case to ensure that your custom strings are properly terminated. You also have to be certain to avoid using them with the standard string functions.
If your special strings are stored in char array of known size then you can get the length of the overall array via sizeof, but that doesn't tell you what portion of the array contains meaningful data. It also doesn't help with any of the other string functions you might want to perform, and it does nothing for you if your handle on the pseudo-strings is a char *.
If you are contemplating custom string functions anyway, then you should consider string objects that have an explicit length stored with them. For example:
struct my_string {
unsigned allocated, length;
char *contents;
};
Your custom functions then handle objects of that type, being certain to do the right thing with the length member. There is no explicit terminator, so these strings can contain any char value. Also, you can be certain not to mixed these up with standard strings.
As long as you store the length of the array of chars then you can have strings with nul characters or even without a terminating nul.
struct MyString
{
int length;
char* buffer;
};
And then you would have to write all your equivalent functions for managing the string.
The bstring library http://bstring.sourceforge.net and Microsofts BSTR (uses wide chars) are existing libraries that work in this way and also offer some compatibilty with c-style strings.
pros - getting the length of the string is quick
cons - the strings need to be dynamically allocated.
I have following code and the out put:-
#include<stdio.h>
int main()
{
char pal_tmp[4];
printf("Size of String Variable %d\n",sizeof(pal_tmp));
strcpy(pal_tmp,"123456789");
printf("Printing Extended Ascii: %s\n",pal_tmp);
printf("Size of String Variable %d\n",sizeof(pal_tmp));
}
Out put:-
Size of String Variable 4
Printing Extended Ascii: 123456789
Size of String Variable 4
My questions is Why String variable (character array) accepts characters more than what its capacity is? Should not it just print 1234 instead of 123456789 ?
Am I doing something wrong?
Well yes. You are doing something wrong. You're putting more characters into the string than you are supposed to. According to the C specification, that is wrong and referred to as "undefined behaviour".
However, that very same C specification does not require the compiler (nor runtime) to actually flag that as an error. "Undefined behaviour" means that anything could happen, including getting an error, random data corruption or the program actually working.
In this particular case, your call to strcpy simply writes outside the reserved memory and will overwrite whatever happens to be stored after the array. There is probably nothing of importance there, which is why nothing bad seems to happen.
As an example of what would happen if you do have something relevant after the array, let's add a variable to see what happens to it:
#include <stdio.h>
int main( void )
{
char foo[4];
int bar = 0;
strcpy( foo, "a long string here" );
printf( "%d\n", bar );
return 0;
}
When run, I get the result 1701322855 on my machine (the results on yours will likely be different).
The call to strcpy clobbered the content of the bar variable, resulting in the random output that you saw.
Well yes, you are overwriting memory that doesn't belong to that buffer (pal_tmp). In some cases this might work, in others you might get a segfault and your program will crash. In the case you showed, it looks like you happened to not overwrite anything "useful". If you tried to write more, you'll be more likely to overwrite something useful and crash the program.
C arrays of char don't have a predefined size, as far as the string handling functions are concerned. The functions will happily write off the end of the array into other variables (bad), or malloc's bookkeeping data (worse), or the call stack's bookkeeping data (even worse). The C standard makes this undefined behaviour, and for good reason.
If a version of a particular function accepts a size argument to limit how much data it writes, use it. It protects you against this stuff.
C does not keep track of the size of strings (or arrays, or allocated memory, etc.), so that is your job. If you create a string, you must be careful to always make sure it never gets longer than the amount of memory you've allocated to it.
In C language Strings are defined as an array of characters or a pointer to a portion of memory containing ASCII characters. A string in C is a sequence of zero or more characters followed by a NULL '\0' character. It is important to preserve the NULL terminating character as it is how C defines and manages variable length strings. All the C standard library functions require this for successful operation.
For complete reference refer this
Function strcpy doesn't have knowledge about the length of the character array - this function is considered as unsecure.
You may use strncpy, where you tell the size of the buffer and if longer argument is provided, only the memory of the buffer is used and nothing else is changed.
This may be a very basic question for some. I was trying to understand how strcpy works actually behind the scenes. for example, in this code
#include <stdio.h>
#include <string.h>
int main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", sizeof(s));
return 0;
}
As I am declaring s to be a static array with size less than that of source. I thought it wont print the whole word, but it did print world isnsadsdas .. So, I thought that this strcpy function might be allocating new size if destination is less than the source. But now, when I check sizeof(s), it is still 6, but it is printing out more than that. Hows that working actually?
You've just caused undefined behaviour, so anything can happen. In your case, you're getting lucky and it's not crashing, but you shouldn't rely on that happening. Here's a simplified strcpy implementation (but it's not too far off from many real ones):
char *strcpy(char *d, const char *s)
{
char *saved = d;
while (*s)
{
*d++ = *s++;
}
*d = 0;
return saved;
}
sizeof is just returning you the size of your array from compile time. If you use strlen, I think you'll see what you expect. But as I mentioned above, relying on undefined behaviour is a bad idea.
http://natashenka.ca/wp-content/uploads/2014/01/strcpy8x11.png
strcpy is considered dangerous for reasons like the one you are demonstrating. The two buffers you created are local variables stored in the stack frame of the function. Here is roughly what the stack frame looks like:
http://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Call_stack_layout.svg/342px-Call_stack_layout.svg.png
FYI things are put on top of the stack meaning it grows backwards through memory (This does not mean the variables in memory are read backwards, just that newer ones are put 'behind' older ones). So that means if you write far enough into the locals section of your function's stack frame, you will write forward over every other stack variable after the variable you are copying to and break into other sections, and eventually overwrite the return pointer. The result is that if you are clever, you have full control of where the function returns. You could make it do anything really, but it isn't YOU that is the concern.
As you seem to know by making your first buffer 6 chars long for a 5 character string, C strings end in a null byte \x00. The strcpy function copies bytes until the source byte is 0, but it does not check that the destination is that long, which is why it can copy over the boundary of the array. This is also why your print is reading the buffer past its size, it reads till \x00. Interestingly, the strcpy may have written into the data of s depending on the order the compiler gave it in the stack, so a fun exercise could be to also print a and see if you get something like 'snsadsdas', but I can't be sure what it would look like even if it is polluting s because there are sometimes bytes in between the stack entries for various reasons).
If this buffer holds say, a password to check in code with a hashing function, and you copy it to a buffer in the stack from wherever you get it (a network packet if a server, or a text box, etc) you very well may copy more data from the source than the destination buffer can hold and give return control of your program to whatever user was able to send a packet to you or try a password. They just have to type the right number of characters, and then the correct characters that represent an address to somewhere in ram to jump to.
You can use strcpy if you check the bounds and maybe trim the source string, but it is considered bad practice. There are more modern functions that take a max length like http://www.cplusplus.com/reference/cstring/strncpy/
Oh and lastly, this is all called a buffer overflow. Some compilers add a nice little blob of bytes randomly chosen by the OS before and after every stack entry. After every copy the OS checks these bytes against its copy and terminates the program if they differ. This solves a lot of security problems, but it is still possible to copy bytes far enough into the stack to overwrite the pointer to the function to handle what happens when those bytes have been changed thus letting you do the same thing. It just becomes a lot harder to do right.
In C there is no bounds checking of arrays, its a trade off in order to have better performance at the risk of shooting yourself in the foot.
strcpy() doesn't care whether the target buffer is big enough so copying too many bytes will cause undefined behavior.
that is one of the reasons that a new version of strcpy were introduced where you can specify the target buffer size strcpy_s()
Note that sizeof(s) is determined at run time. Use strlen() to find the number of characters s occupied. When you perform strcpy() source string will be replaced by destination string so your output wont be "Helloworld isnsadsdas"
#include <stdio.h>
#include <string.h>
main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", strlen(s));
}
You are relying on undefined behaviour in as much as that the compiler has chose to place the two arrays where your code happens to work. This may not work in future.
As to the sizeof operator, this is figured out at compile time.
Once you use adequate array sizes you need to use strlen to fetch the length of the strings.
The best way to understand how strcpy works behind the scene is...reading its source code!
You can read the source for GLibC : http://fossies.org/dox/glibc-2.17/strcpy_8c_source.html . I hope it helps!
At the end of every string/character array there is a null terminator character '\0' which marks the end of the string/character array.
strcpy() preforms its task until it sees the '\0' character.
printf() also preforms its task until it sees the '\0' character.
sizeof() on the other hand is not interested in the content of the array, only its allocated size (how big it is supposed to be), thus not taking into consideration where the string/character array actually ends (how big it actually is).
As opposed to sizeof(), there is strlen() that is interested in how long the string actually is (not how long it was supposed to be) and thus counts the number of characters until it reaches the end ('\0' character) where it stops (it doesn't include the '\0' character).
Better Solution is
char *strcpy(char *p,char const *q)
{
char *saved=p;
while(*p++=*q++);
return saved;
}
Hi I have a char string
name[50] = "I love programming"
what happen is that I want to empty this string before I call my another function so that I can store something in the same array
will this work?
name[0] = '\0';
or is there anyway to empty the string without creating any new function or use any other library?
Setting first char to nul is perfectly acceptable. But if that string was sensitive in terms of security, then you should zero it out with memset.
Edit:
Answer from Matteo Italia made me dig a bit deeper on this subject. According to this document (and Matteos answer) memset could be optimized away, and so is not the best option to remove sensitive information from memory. The document has several options, but none of them is portable and reliable, so it proposes new function standard memset_s just for such purposes. This function does not exist yet, so we're currently stuck with non-portable (SecureZeroMemory), non-reliable (volatile trick), or non-optimal options (secure_memset example).
There's really no concept of emptying a char string in C. It's simply a pointer that points to some allocated memory. You can reuse that memory in any way you wish, without "emptying" it first. So, the easiest way to empty it is "just don't".
If you want to explicitly clear all contents of the string for some reason, use the memset approach given in other answers.
If you want to "empty" it in the sense that when it's printed, nothing will be printed, then yes, just set the first char to `\0'.
To conclude, it all depends on what you really want to do. Why do you want to "empty" the string?
Use memset instead. This would just nullify the buffer but the memory allocated would any how gets deallocated from stack when the variable goes out of scope.
memset (name,'\0',sizeof(name));
IIRC, you might use memset this way:
char * myString;
...
size_t len = strlen(myString)
memset (myString, 0,len);
Tehnically it is correct, for example:
char array[10] = "hello";
printf("%d\r\n", strlen(array)); // prints 5
array[0] = '\0';
printf("%d\r\n", strlen(array)); // prints 0
memset(name, 0, 50);
or
bzero(name, 50);
It depends from the effect you want to obtain. If you just want to zero its length you can do, as you said:
*name='\0';
If, instead, you want to clean your string from sensitive data, you should zero it completely with memset (some operating systems also have a "secure" zeroing function that should be guaranteed not to be optimized away by the compiler - see e.g. SecureZeroMemory on Windows).
On the other hand, if the function you are calling just uses the buffer you are passing as an output buffer disregarding its content, you may just leave the buffer as it is.
I'm writing a C code for a class. This class requires that our code compile and run on the school server, which is a sparc solaris machine. I'm running Linux x64.
I have this line to parse (THIS IS NOT ACTUAL CODE BUT IS INPUT TO MY PROGRAM):
while ( cond1 ){
I need to capture the "while" and the "cond1" into separate strings. I've been using strtok() to do this. In Linux, the following lines:
char *cond = NULL;
cond = (char *)malloc(sizeof(char));
memset(cond, 0, sizeof(char));
strcpy(cond, strtok(NULL, ": \t\(){")); //already got the "while" out of the line
will correctly capture the string "cond1".Running this on the solaris machine, however, gives me the string "cone1".
Note that in plenty of other cases within my program, strings are being copied correctly. (For instance, the "while") was captured correctly.
Does anyone know what is going on here?
The line:
cond = (char *)malloc(sizeof(char));
allocates exactly one char for storage, into which you are then copying more than one - strcpy needs to put, at a bare minimum, the null terminator but, in your case, also the results of your strtok as well.
The reason it may work on a different system is that some implementations of malloc will allocate at a certain resolution (e.g., a multiple of 16 bytes) no matter what actual value you ask for, so you may have some free space there at the end of your buffer. But what you're attempting is still very much undefined behaviour.
The fact that the undefined behaviour may be to work sometimes in no way abrogates your responsibility to avoid such behaviour.
Allocate enough space for storing the results of your strtok and you should be okay.
The safest way to do this is to dynamically allocate the space so that it's at least as big as the string you're passing to strtok. That way there can be no possibility of overflow (other than weird edge cases where other threads may modify the data behind your back but, if that were the case, strtok would be a very bad choice anyway).
Something like (if instr is your original input string):
cond = (char*)malloc(strlen(instr)+1);
This guarantees that any token extracted from instr will fit within cond.
As an aside, sizeof(char) is always 1 by definition, so you don't need to multiply by it.
cond is being allocated one byte. strcpy is copying at least two bytes to that allocation. That is, you are writing more bytes into the allocation than there is room for.
One way to fix it to use char *cond = malloc (1000); instead of what you've got.
You only allocated memory for 1 character but you trying to store at least 6 characters (you need space for the terminating \0). The quick and dirty way to solve this is just say
char cond[128]
instead of malloc.