I'm working to try and understand some string functions so I can more effectively use them in later coding projects, so I set up the simple program below:
#include <stdio.h>
#include <string.h>
int main (void)
{
// Declare variables:
char test_string[5];
char test_string2[] = { 'G', 'O', '_', 'T', 'E', 'S', 'T'};
int init;
int length = 0;
int match;
// Initialize array:
for (init = 0; init < strlen(test_string); init++)
{ test_string[init] = '\0';
}
// Fill array:
test_string[0] = 'T';
test_string[1] = 'E';
test_string[2] = 'S';
test_string[3] = 'T';
// Get Length:
length = strlen(test_string);
// Get number of characters from string 1 in string 2:
match = strspn(test_string, test_string2);
printf("\nstrlen return = %d", length);
printf("\nstrspn return = %d\n\n", match);
return 0;
}
I expect to see a return of:
strlen return = 4
strspn return = 4
However, I see strlen return = 6 and strspn return = 4. From what I understand, char test_string[5] should allocate 5 bytes of memory and place hex 00 into the fifth byte. The for loop (which should not even be nessecary) should then set all the bytes of memory for test_string to hex 00. Then, the immediately proceeding lines should fill test_string bytes 1 through 4 (or test_string[0] through test_string[3]) with what I have specified. Calling strlen at this point should return a 4, because it should start at the address of string 0 and count an increment until it hits the first null character, which is at string[4]. Yet strlen returns 6. Can anyone explain this? Thanks!
char test_string[5];
test_string is an array of 5 uninitialized char objects.
for (init = 0; init < strlen(test_string); init++)
Kaboom. strlen scans for the first '\0' null character. Since the contents of test_string are garbage, the behavior is undefined. It might return a small value if there happens to be a null character, or a large value or program crash if there don't happen to be any zero bytes in test_string.
Even if that weren't the case, evaluating strlen() in the header of a for loop is inefficient. Each strlen() call has to re-scan the entire string (assuming you've given it a valid string), so if your loop worked it would be O(N2).
If you want test_string to contain just zero bytes, you can initialize it that way:
char test_string[5] = "";
or, since you initialize the first 4 bytes later:
char test_string[5] = "TEST";
or just:
char test_string[] = "TEST";
(The latter lets the compiler figure out that it needs 5 bytes.)
Going back to your declarations:
char test_string2[] = { 'G', 'O', '_', 'T', 'E', 'S', 'T'};
This causes test_string2 to be 7 bytes long, without a trailing '\0' character. That means that passing test_string2 to any function that expects a pointer to a string will cause undefined behavior. You probably want something like:
char test_string2[] = "GO_TEST";
strlen searches for '\0' character to count them, in your test_string, there is none so it continues until it finds one which happens to be 6 bytes away from the start of your array since it is uninitialized.
The compiler does not generate code to initialize the array so you don't have to pay to run that code if you fill it later.
To initialize it to 0 and skip the loop, you can use
char test_string[5] = {0};
This way, all character will be initialized to 0 and your strlen will work after you filled the array with "TEST".
There are a few problems here. First of all, char test_string[5]; simply sets aside 5 bytes for that string, but does not set the bytes to anything. In particular, when you say "char test_string[5] should allocate 5 bytes of memory and place hex 00 into the fifth byte", the second part is wrong.
Secondly, your array initialization loop uses strlen(test_string) but since the bytes of test_string are uninitialized, there's no way to know what's there so strlen(test_string) returns some undefined result. A better way to clear the array would be memset( test_string, 0, sizeof(test_string) );.
You fill the array with "TEST" but don't set the NULL byte at the end, so the last byte is still uninitialized. If you do the memset above this will be fixed, or you can manually do test_string[4] = '\0'.
Related
The following code works as expected and outputs ABC:
#include <stdio.h>
void printString (char toPrint [100]);
int main()
{
char hello [100];
hello[0] = 'A';
hello[1] = 'B';
hello[2] = 'C';
hello[3] = '\0';
printString(hello);
}
void printString (char toPrint [100])
{
int i = 0;
while (toPrint[i] != '\0')
{
printf("%c", toPrint[i]);
++i;
}
}
But if I remove the line that adds the null-character
hallo[3] = '\0';
I get random output like wBCÇL, ╗BCÄL, ┬BCNL etc.
Why is that so? What I expected is the loop in printString() to run forever because it doesn't run into a '\0', but what happend to 'A', 'B' and 'C'? Why do B and C still show up in the output but A is replaced by some random character?
You declaration of hello leaves it uninitialized and filled with random bytes
int main()
{
char hello [100];
...
}
If you want zero initialized array use
int main()
{
char hello [100] = {0};
...
}
There must have been, by pure chance, the value for \r somewhere in the memory cells following those of my array hello. That's why my character 'A' was overwritten.
On other machines, "ABC" was ouput as expected, followed by random characters.
Initializing the array with 0s, purposely omitted here, of course solves the problem.
edit:
I let the code print out each character in binary and toPrint[5] was indeed 00001101 which is ASCII for \r (carriage return).
When you declare an automatic like char hello [100];, the first thing to understand is that the 100 bytes can contain just about anything. You must assign values to each byte explicitly to do / have something meaningful.
You are terminating you loop when you find the \0 a.k.a the NUL character. Now, if you comment out the instruction which puts the \0 after the character c, your loop runs until you actually find \0.
Your array might contain \0 at some point or it might not. There are chances you might go beyond the 100 bytes still looking for a \0 and invoke undefined behaviour. You also invoke UB when you try to work with an unassigned piece of memory.
After assigning 26th element, when printed, still "Computer" is printed out in spite I assigned a character to 26th index. I expect something like this: "Computer K "
What is the reason?
#include <stdio.h>
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); /*prints out "Computer"*/
m1[26] = 'K';
printf("%s\n", m1); /*prints out "Computer"*/
printf("%c", m1[26]); /*prints "K"*/
}
At 8th index of that string the \0 character is found and %s prints only till it finds a \0 (the end of string, marked by \0) - at 26th the character k is there but it will not be printed as \0 is found before that.
char s[100] = "Computer";
is basically the same as
char s[100] = { 'C', 'o', 'm', 'p', 'u','t','e','r', '\0'};
Since printf stops when the string is 0-terminated it won't print character 26
Whenever you partially initialize an array, the remaining elements are filled with zeroes. (This is a rule in the C standard, C17 6.7.9 §19.)
Therefore char m1[40] = "Computer"; ends up in memory like this:
[0] = 'C'
[1] = 'o'
...
[7] = 'r'
[8] = '\0' // the null terminator you automatically get by using the " " syntax
[9] = 0 // everything to zero from here on
...
[39] = 0
Now of course \0 and 0 mean the same thing, the value 0. Either will be interpreted as a null terminator.
If you go ahead and overwrite index 26 and then print the array as a string, it will still only print until it encounters the first null terminator at index 8.
If you do like this however:
#include <stdio.h>
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); // prints out "Computer"
m1[8] = 'K';
printf("%s\n", m1); // prints out "ComputerK"
}
You overwrite the null terminator, and the next zero that happened to be in the array is treated as null terminator instead. This code only works because we partially initialized the array, so we know there are more zeroes trailing.
Had you instead written
int main()
{
char m1[40];
strcpy(m1, "Computer");
This is not initialization but run-time assignment. strcpy would only set index 0 to 8 ("Computer" with null term at index 8). Remaining elements would be left uninitialized to garbage values, and writing m1[8] = 'K' would destroy the string, as it would then no longer be reliably null terminated. You would get undefined behavior when trying to print it: something like garbage output or a program crash.
In C strings are 0-terminated.
Your initialization fills all array elements after the 'r' with 0.
If you place a non-0 character in any random field of the array, this does not change anything in the fields before or after that element.
This means your string is still 0-terminated right after the 'r'.
How should any function know that after that string some other string might follow?
That's because after "Computer" there's a null terminator (\0) in your array. If you add a character after this \0, it won't be printed because printf() stops printing when it encounters a null terminator.
Just as an addition to the other users answers - you should try to answer your question by being more proactive in your learning. It is enough to write a simple program to understand what is happening.
int main()
{
char m1[40] = "Computer";
printf("%s\n", m1); /*prints out "Computer"*/
m1[26] = 'K';
for(size_t index = 0; index < 40; index++)
{
printf("m1[%zu] = 0x%hhx ('%c')\n", index, (unsigned char)m1[index], (m1[index] >=32) ? m1[index] : ' ');
}
}
I want to assign the first two values from the hash array to the salt array.
char hash[] = {"HAodcdZseTJTc"};
char salt[] = {hash[0], hash[1]};
printf("%s", salt);
However, when I attempt this, the first two values are assigned and then all thirteen values are also assigned to the salt array. So my output here is not:
HA
but instead:
HAHAodcdZseTJTC
salt is not null-terminated. Try:
char salt[] = {hash[0], hash[1], '\0'};
Since you are adding just two characters to the salt array and you are not adding the '\0' terminator.
Passing a non nul terminated array as a parameter to printf() with a "%s" specifier, causes undefined behavior, in your case it prints hash in my case
HA#
was printed.
Strings in c use a special convetion to know where they end, a non printable special character '\0' is appended at the end of a sequence of non-'\0' bytes, and that's how a c string is built.
For example, if you were to compute the length of a string you would do something like
size_t stringlength(const char *string)
{
size_t length;
for (length = 0 ; string[length] != '\0' ; ++length);
return length;
}
there are of course better ways of doing it, but I just want to illustrate what the significance of the terminating '\0' is.
Now that you know this, you should notice that
char string[] = {'A', 'B', 'C'};
is an array of char but it's not a string, for it to be a string, it needs a terminating '\0', so
char string[] = {'A', 'B', 'C', '\0'};
would actually be a string.
Notice that then, when you allocate space to store n characters, you need to allocate n + 1 bytes, to make room for the '\0'.
In the case of printf() it will try to consume all the bytes that the passed pointer points at, until one of them is '\0', there it would stop iterating through the bytes.
That also explains the Undefined Behavior thing, because clearly printf() would be reading out of bounds, and anything could happen, it depends on what is actually there at the memory address that does not belong the the passed data but is off bounds.
There are many functions in the standard library that expect strings, i.e. _sequences of non nul bytes, followed by a nul byte.
I need to do a RPC. I'm trying to encode the length of a function name followed by the name of the function.
Function name: say_hello
Function name length: 9
Encoded array: [9, 's', 'a', 'y', ..., 'l', 'l', 'o']
So far:
unsigned char* encode_int(unsigned char *buffer, int value) {
buffer[0] = value >> 24;
buffer[1] = value >> 16;
buffer[2] = value >> 8;
buffer[3] = value;
return buffer + 4;
}
char* function_name = "say_hello";
char* buffer[256];
buffer = encode_int(&buffer, strlen(function_name));
strcpy(buffer, function, strlen(function_name));
puts(buffer);
You have lots of problems with your code. I won't just give you a working solution but will point out the problems. The first thing is that the code obviously doesn't compile. You are passing an undefined variable function to strcpy and give too many arguments to strcpy. I'll assume you've just transcribed the program incorrectly. But even if you fix that you will get a few compiler warnings which, if heeded, would identify most of your problems.
You are passing the address of buffer rather than the buffer itself to encode_int.
You declare buffer as an array of char pointers. Looks like what you really want is an array of char.
You encode an int into the start of the buffer. And then you try to print it as a string (via puts). That's not going to work and will result in no output (as you have probably found). Because the int will have a 0 value in the first byte (as you have encoded it). This is a null terminator for a string and hence the blank output.
EDIT: Correction to point 3. You've actually incremented buffer by 4 (if everyting else was fixed). So the puts will only show the function name (again, if everything else was fixed). And you've effectively lost the function name length.
Put words into buffer like this
int len;
len=Strlen(word);
memcpy(buffer,&len,sizeof(Int));
strncpy (&buffer[sizeof (int)],word,len);
Change the puts line to read
printf("%d %s\n",buffer,&buffer[sizeof (int)]);
I want to copy X to Y words of a string to the out char * array.
unsigned char * string = "HELLO WORLD!!!" // length 14
unsigned char out[9];
size_t length = 9;
for(i=0 ;i < length ;++i)
{
out[i] = string[i+3];
}
printf("%s = string\n%s = out\n", string, out);
When looking at the output of out, why is there gibberish after a certain point of my string? I see the string of out as LO WORLD!# . Why are there weird characters appearing after the content I copied, isn't out supposed to be a an array of 9? I expected the output to be
LO WORLD!
In C you need to terminate your string with a 0x00 value so a string of length 9 needs ten bytes to store it with the last set to 0. Otherwise your print statements run off into random data.
unsigned char * string = "HELLO WORLD!!!" // length 14
unsigned char out[10];
size_t length = 9;
for(i=0 ;i < length ;++i)
{
out[i] = string[i+3];
}
out[length] = 0x00;
printf("%s = string\n%s = out\n", string, out);
A minor point, but string literals have type char* (or const char* in C++), not unsigned char* -- these might be the same in your implementation, but they don't need to be.
Furthermore, this is not true:
unsigned char * string = "HELLO WORLD!!!" // length 14
The string actually occupies 15 bytes -- there is an extra, hidden '\0' at the end, called a nul byte, which marks the end of the string. These nul terminators are very important, because if they're not present, then many C library functions which manipulate strings will keep going until they hit a byte with a value equal to '\0' -- and so can end up reading or trampling over bits of memory they shouldn't do. This is called a buffer overrun, and is a classic bug (and exploitable security problem) in C programmes.
In your example, you haven't included this nul terminator in your copied string, so printf() just keeps going until it finds one, hence the gibberish you're seeing. In general, it's a good idea only to use C library functions to manipulate C strings if possible, as these are careful to add the terminator for you. In this case, strncpy from string.h does exactly what you're after.
A 9 character string needs 10 bytes because it must be null ( 0 ) terminated. Try this:
unsigned char out[10]; // make this 10
size_t length = 9;
for(i=0 ;i < length ;++i)
{
out[i] = string[i+3];
}
out[i] = 0; // add this to terminate the string
A better approach would be just the line:
strncpy(out, string+3, 9);
C strings must be null terminated. You only created an array large enough for 8 characters + the null terminator, but you never added the terminator.
So, you need to allocate the length plus 1 and add the terminator.
// initializes all elements to 0
char out[10] = {0};
// alternatively, add it at the end.
out[9] = '\0';
Think of it this way; you're passed a char* which represents a string. How do you know how long it is? How can you read it? Well, in C, a sentinel value is added to the end. This is the null terminator. It is how strings are read in C, and passing around unterminated strings to functions which expect C strings results in undefined behavior.
And then... just use strncpy to copy strings.
If you want to have copy 9 characters from your string, you'll need to have an array of 10 to do that. It is because a C string needs to have '\0' as null terminated character. So your code should be rewritten like this:
unsigned char * string = "HELLO WORLD!!!" // length 14
unsigned char out[10];
size_t length = 9;
for(i=0 ;i < length ;++i)
{
out[i] = string[i+3];
}
out[9] = 0;
printf("%s = string\n%s = out\n", string, out);