Character string declaration with static size - c

I was trying different ways to declare a string in C for exam preparation. We know that string in C is character array with '\0' at end. Then I found that even if I declare an array of 5 characters and put 5 characters in it like "abcde" it is accepted. Then where is the null char stored?
I declared strings in following ways
char str[] = {'a','b','c','d','e'};
char str2[] = "abcde";
char str3[5] = "abcde";
Now, in the 3rd case, I am allocating 5 byte of space and I have exactly 5 characters in the array, then if string should have a null character at end where is it being stored? Or is it the case that null is not appended?
What about the 1st and 2nd cases, are null appended there?
strlen() returns 5 in all 3 cases.

The NUL character is stored only in the second example
char str2[] = "abcde";
where the array is sized automatically to include it. An array of characters does not have to be a string, and the other two are encoded without the NUL terminator.
If the code happens to treat them correctly as strings, that was an unfortunate result of undefined behaviour.
#include <stdio.h>
int main(int argc, char *argv[])
{
char str[] = {'a','b','c','d','e'};
char str2[] = "abcde";
char str3[5] = "abcde";
printf("%zu\n", sizeof str);
printf("%zu\n", sizeof str2);
printf("%zu\n", sizeof str3);
}
Program output:
5
6
5

In the third case, the null terminator is not appended. From the documentation:
If the size of the array is known, it may be one less than the size of
the string literal, in which case the terminating null character is
ignored:
So when you check its length with strlen, you get undefined behavior (and the the same when you try it with the first one, since that also isn't a string).
Any less that that is not allowed, (for me on MSVC it shows an error but still compiles it, for some reason). More than the string length is allowed, in which case the rest is zero-initialized.

Related

Does directly assigning a string of char's to a char pointer on initialization automatically add a null terminator?

For example in this code:
char *ptr = "string";
Is there a null terminator in the stored in the ptr[6] address?
When I test this and print a string, it prints "string", and if I print the ptr[6] char I get ''. I wanted to test this further so I did some research and found someone saying that strlen will always crash if there is not a null terminator. I ran this in my code and it returned 6, so does this mean that assigning a string to a char pointer initializes with a null terminator or am I misunderstanding what's happening?
Yes. String literals used as pointers will always end in a NUL byte. String literals used as array initializers will too, unless you specify a length that's too small for it to (e.g., char arr[] = "string"; and char arr[7] = "string"; both will, but char arr[6] = "string"; won't).

Getting wrong string length

I am trying to get the length of a string but i am getting the wrong value, it is saying that it is only 4 characters long. Why is this? am i using sizeof() correctly?
#include <stdio.h>
int main(void)
{
char *s;
int len;
s = "hello world";
len = sizeof(s);
printf("%d\n", len);
}
The sizeof operator is returning the size of the pointer. If you want the length of a string, use the strlen function.
Even if you had an array (e.g. char s[] = "hello world") the sizeof operator would return the wrong value, as it would return the length of the array which includes the string terminator character.
Oh and as a side note, if you want a string pointer to point to literal string, you should declare it const char *, as string literals are constant and can't be modified.
You have declared s as a pointer. When applied to a pointer, sizeof() returns the size of the pointer, not the size of the element pointed to. On your system, the size of a pointer to char happens to be four bytes. So you will see 4 as your output.
In addition to strlen(), you can assign string literal to array of chars
char s[] = "hello world", in this case sizeof() returns size of array in bytes. In this particular case 12, one extra byte for \0 character at the end of the string.
Runtime complexity of sizeof() is O(1).
Complexity of strlen() is O(n).

How to get the string size in bytes?

As the title implies, my question is how to get the size of a string in C. Is it good to use sizeof if I've declared it (the string) in a function without malloc in it? Or, if I've declared it as a pointer? What if I initialized it with malloc? I would like to have an exhaustive response.
You can use strlen. Size is determined by the terminating null-character, so passed string should be valid.
If you want to get size of memory buffer, that contains your string, and you have pointer to it:
If it is dynamic array(created with malloc), it is impossible to get
it size, since compiler doesn't know what pointer is pointing at.
(check this)
If it is static array, you can use sizeof to get its size.
If you are confused about difference between dynamic and static arrays, check this.
Use strlen to get the length of a null-terminated string.
sizeof returns the length of the array not the string. If it's a pointer (char *s), not an array (char s[]), it won't work, since it will return the size of the pointer (usually 4 bytes on 32-bit systems). I believe an array will be passed or returned as a pointer, so you'd lose the ability to use sizeof to check the size of the array.
So, only if the string spans the entire array (e.g. char s[] = "stuff"), would using sizeof for a statically defined array return what you want (and be faster as it wouldn't need to loop through to find the null-terminator) (if the last character is a null-terminator, you will need to subtract 1). If it doesn't span the entire array, it won't return what you want.
An alternative to all this is actually storing the size of the string.
While sizeof works for this specific type of string:
char str[] = "content";
int charcount = sizeof str - 1; // -1 to exclude terminating '\0'
It does not work if str is pointer (sizeof returns size of pointer, usually 4 or 8) or array with specified length (sizeof will return the byte count matching specified length, which for char type are same).
Just use strlen().
If you use sizeof()then a char *str and char str[] will return different answers. char str[] will return the length of the string(including the string terminator) while char *str will return the size of the pointer(differs as per compiler).
I like to use:
(strlen(string) + 1 ) * sizeof(char)
This will give you the buffer size in bytes. You can use this with snprintf() may help:
const char* message = "%s, World!";
char* string = (char*)malloc((strlen(message)+1))*sizeof(char));
snprintf(string, (strlen(message)+1))*sizeof(char), message, "Hello");
Cheers! Function: size_t strlen (const char *s)
There are two ways of finding the string size bytes:
1st Solution:
# include <iostream>
# include <cctype>
# include <cstring>
using namespace std;
int main()
{
char str[] = {"A lonely day."};
cout<<"The string bytes for str[] is: "<<strlen(str);
return 0;
}
2nd Solution:
# include <iostream>
# include <cstring>
using namespace std;
int main()
{
char str[] = {"A lonely day."};
cout<<"The string bytes for str[] is: "<<sizeof(str);
return 0;
}
Both solution produces different outputs. I will explain it to you after you read these.
The 1st solution uses strlen and based on cplusplus.com,
The length of a C string is determined by the terminating null-character: A C string is as long as the number of characters between the beginning of the string and the terminating null character (without including the terminating null character itself).
That can explain why does the 1st Solution prints out the correct string size bytes when the 2nd Solution prints the wrong string size bytes. But if you still don't understand, then continue reading.
The 2nd Solution uses sizeof to find out the string size bytes. Based on this SO answer, it says (modified it):
sizeof("f") must return 2 string size bytes, one for the 'f' and one for the terminating '\0' (terminating null-character).
That is why the output is string size bytes 14. One for the whole string and one for '\0'.
Conclusion:
To get the correct answer for 2nd Solution, you must do sizeof(str)-1.
References:
Sizeof string literal
https://cplusplus.com/reference/cstring/strlen/?kw=strlen

Basic C question on copying char arrays to char pointers

I have some doubts in basic C programming.
I have a char array and I have to copy it to a char pointer. So I did the following:
char a[] = {0x3f, 0x4d};
char *p = a;
printf("a = %s\n",a);
printf("p = %s\n",p);
unsigned char str[] = {0x3b, 0x4b};
unsigned char *pstr =str;
memcpy(pstr, str, sizeof str);
printf("str = %s\n",str);
printf("pstr = %s\n",pstr);
My printf statements for pstr and str get appended with the data "a".
If I remove memcpy I get junk. Can some C Guru enlighten me?
Firstly, C strings (the %s in printf) are expected to be NUL-terminated. You're missing the terminators. Try char a[] = {0x3f, 0x4d, 0} (same goes for str).
Secondly, pstr and str point to the same memory, so your memcpy is a no-op. This is a minor point compared to the first one.
Add a null terminator, cause that's what you printf expects:
char a[] = {0x3f, 0x4d, '\0'};
The standard way C strings are represented is that in memory, they are a sequence of non-zero bytes representing the characters, followed by a zero (or NULL) byte. You should declare:
char a[] = {0x3f, 0x4d, 0};
When you assign a string pointer (as in unsigned char *pstr = str;) both pointers point to the same memory area, and thus the same characters. There is no need to copy the characters.
When you do need to copy characters, you should be using strlen(), the sizeof() operator returns the number of bytes its argument uses in memory. sizeof(pointer) is the number of bytes the pointer uses, not the length of the string. You find the length of a string (i.e. the number of bytes it occupies in memory) with the strlen() function. Also, there are standard functions to copy C strings. You should rely on those to do the right thing:
strcpy(pstr, str);
printf's %s expects a 0-terminated string, your strings aren't. The uninitialized memory following your arrays may however happen to start with a 0-byte, in which case your code will appear to be correct - it still isn't.
You're declaring an array "str", then pointing to it with pstr. Note that you have no null-terminating character, so after using memcpy you copy the block to itself with no null terminator, as a string requires. Thus, printf can't find the end of the string and continues printing until it finds a 0 (or '\0' in character terms)
Agreed. You'll have to add a null byte at the end of your array of chars.
char a[] = {0x3f, 0x4d, '\0'};
The reason being is that you're creating a string without declaring where it actually ends. Your memcpy() function copies *str to *pstr and automatically adds a null byte for you, which is why it works.
Without memcpy() there the string never knows when to end, so it reaches into subsequent memory addresses and returns whatever random values are stored there. When you're creating a string out of characters, always remember to end it with a null byte.

Passing not null terminated string to printf results in unexpected value

This C program gives a weird result:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
char str1[5] = "abcde";
char str2[5] = " haha";
printf("%s\n", str1);
return 0;
}
when I run this code I get:
abcde haha
I only want to print the first string as can be seen from the code.
Why does it print both of them?
"abcde" is actually 6 bytes long because of the null terminating character in C strings. When you do this:
char str1[5] = "abcde";
You aren't storing the null terminating character so it is not a proper string.
When you do this:
char str1[5] = "abcde";
char str2[5] = " haha";
printf("%s\n", str1);
It just happens to be that the second string is stored right after the first, although this is not required. By calling printf on a string that isn't null terminated you have already caused undefined behavior.
Update:
As stated in the comments by clcto this can be avoided by not explicitly specifying the size of the array and letting the compiler determine it based off of the string:
char str1[] = "abcde";
or use a pointer instead if that works for your use case, although they are not the same:
const char *str1 = "abcde";
Both strings str1 and str2 are not null terminated. Therefore the statement
printf("%s\n", str1);
will invoke undefined behavior.
printf prints the characters in a string one by one until it encounters a '\0' which is not present in your string. In this case printf continues past the end of the string until it finds a null character somewhere in the memory. In your case it seems that printf past the end of string "abcde" and continues to print the characters from second string " haha" which is by chance located just after first string in the memory.
Better to change the block
char str1[5] = "abcde";
char str2[5] = " haha";
to
char str1[] = "abcde";
char str2[] = " haha";
to avoid this problem.
Technically, this behavior is not unexpected, it is undefined: your code is passing a pointer to a C string that lacks null terminator to printf, which is undefined behavior.
In your case, though, it happens that the compiler places two strings back-to-back in memory, so printf runs into null terminator after printing str2, which explains the result that you get.
If you would like to print only the first string, add space for null terminator, like this:
char str1[6] = "abcde";
Better yet, let the compiler compute the correct size for you:
char str1[] = "abcde";
You have invoked undefined behaviour. Here:
char str1[5] = "abcde";
str1 has space for the above five letters but no null terminator.
Then the way you try to pass not null terminated string to printf for printing, invokes undefined behaviour.
In general in most of the cases it is not good idea to pass not null terminated strings to standard functions which expect (C) strings.
In C such declarations
char str1[5] = "abcde";
are allowed. In fact there are 6 initializers because the string literal includes the terminating zero. However in the left side there is declared a character array that has only 5 elements. So it does not include the terminating zero.
It looks like
char str1[5] = { 'a', 'b', 'c', 'd', 'e', '\0' };
If you would compile this declaration in C++ then the compiler issues an error.
It would be better to declare the array without specifying explicitly its size.
char str1[] = "abcde";
In this case it would have the size equal to the number of characters in the string literal including the terminating zero that is equal to 6. And you can write
printf("%s\n", str1);
Otherwise the function continues to print characters beyond the array until it meets the zero character.
Nevertheless it is not an error. Simply you should correctly specify the format specifier in the call of printf:
printf("%5.5s\n", str1);
and you will get the expected result.
The %s specifier searches for a null termination. Therefore you need to add '\0' to the end of your string
char str1[6] = "abcde\0";

Resources