sizeof for a string in array of strings - c

I'm trying to switch from python to c for sometime, I was just checking out a few functions, what caught my attention is sizeof operator which returns the size of object in bytes. I created an array of strings, and would want to find the size of the array. I know that it can be done by sizeof(array)/sizeof(array[0]). However, I find this a bit confusing.
I expect that large array would be 2D (which is just 1D array represented differently) and each character array within this large array would occupy as many bytes as the maximum size of character array within this large array. Example below
#include <stdio.h>
#include <string.h>
const char *words[] = {"this","that","Indian","he","she","sometimes","watch","now","browser","whatsapp","google","telegram","cp","python","cpp","vim","emacs","jupyter","space","earphones","laptop","charger","whiteboard","chalk","marker","matrix","theory","optimization","gradient","descent","numpy","sklearn","pandas","torch","array"};
const int length = sizeof(words)/sizeof(words[0]);
int main()
{
printf("%s",words[1]);
printf("%i",length);
printf("\n%lu",sizeof(words[0]));
printf("\n%lu %lu %s",sizeof(words[27]),strlen(words[27]),words[27]);
return 0;
}
[OUT]
that35
8
8 12 optimization
each of the character arrays occupy 8 bytes, including the character array "optimization". I don't understand what is going on here, the strlen function gives expected output since it just find NULL character in the character array, I'd expected the output of sizeof operator to be 1 more than the output of strlen.
PS: I didn't find some resource that addresses this issue.

It's happening because sizeof(words[27]) is giving the size of a pointer and words[27] is a pointer, and pointers have a fixed size of each machine, mostly 8 bytes on a x86_64 architecture CPU. Also, words is an array of pointers.
each of the character arrays occupy 8 bytes, including the character array "optimization".
No, each word in words is occupying a fixed memory (their length), 8 bytes is the size of pointer which is unsigned long int, it stores the address of the word in words.
const int length = sizeof(words)/sizeof(words[0]);
The above line gives 35 because words is not decayed as a pointer, it is stored in the program's data section, because it's a global variable.
Read More about pointer decaying:
https://www.geeksforgeeks.org/what-is-array-decay-in-c-how-can-it-be-prevented/
https://www.opensourceforu.com/2016/09/decayintopointers/

words is an array of pointer to const char, statically initialized like this diagram:
In practice, the words will probably point to multiple entries from read-only-data. To use words in this manner, it is totally appropriate to use strlen.

Related

Why is an odd-sized char array that follows an even-sized array not stored on the next available memory byte?

Let's assume we are on a 32-bit computer and programming in C. We define an array of type int, whose length is 2, followed by an array of type char, whose length is 8. When we look at the memory address of the first index of the char array, it differs by 4 from the memory address of the last index of the int array. That makes sense to me because the last index of the int array allocates 4 bytes of memory. However, let's assume, instead of a char array of length 8, we define one of length 9 (or whatever odd number). In this case, the memory address of the first index of the char array differs by more than 4 from the memory address of the last index of the int array. In this last case, why does the char array not allocate the next available byte in memory but skips allocation by some bytes?

Why strtok_r break string at '.'(peroid) instead of ',' (comma)? [duplicate]

I have a piece of C code and I don't understand how the sizeof(...) function works:
#include <stdio.h>
int main(){
const char firstname[] = "bobby";
const char* lastname = "eraserhead";
printf("%lu\n", sizeof(firstname) + sizeof(lastname));
return 0;
}
In the above code sizeof(firstname) is 6 and sizeof(lastname) is 8.
But bobby is 5 characters wide and eraserhead is 11 wide. I expect 16.
Why is sizeof behaving differently for the character array and pointer to character?
Can any one clarify?
firstname is a char array carrying a trailing 0-terminator. lastname is a pointer. On a 64bit system pointers are 8 byte wide.
sizeof an array is the size of the total array, in the case of "bobby", it's 5 characters and one trailing \0 which equals 6.
sizeof a pointer is the size of the pointer, which is normally 4 bytes in 32-bit machine and 8 bytes in 64-bit machine.
The size of your first array is the size of bobby\0. \0 is the terminator character, so it is 6.
The second size is the size of a pointer, which is 8 byte in your 64bit system. Its size doesn't depends on the assigned string's length.
how the sizeof(...) function works
sizeof() looks like a function but it's not a function. A function computes something at run-time.
sizeof() asks the compiler, at compile-time, how much memory it allocates for the argument. BTW sizeof() has no idea how much of it you actually use later at run time. In other words, you've "hardcoded" the printf arguments in your example.
Why is sizeof behaving differently for the character array and pointer
to character?
A pointer rarely requires the same amount of memory as an array.
In general, the amount of memory allocated for a pointer is different to what is allocated for its pointee.
firstname is an array of 6 chars, including the terminating '\0' character at the end of the string. That's why sizeof firstname is 6.
lastname is a pointer to char, and will have whatever size such a pointer has on your system. Typical values are 4 and 8. The size of lastname will be the same no matter what it is pointing to (or even if it is pointing to nothing at all).
firstname[] is null-terminated, which adds 1 to the length.
sizeof(lastname) is giving the size of the pointer instead of the actual value.

Why I am getting this decimal int in this char?

I am starting C before learned Python and i am having some doubts in some concepts.
I am running this example in a 64-bit machine.
/* I understand that "vid" is only a char like any other else not a array of char
and its sizeof is 1 byte. The decimal int is 100 and the char is 'D'.
Why? 'vid' does not exist in ASCII table. How does the compiler leads with that */
char name = "vid";
/* sizeof is 8 bytes. I am not sure because if char is int therefore an array
of char would
be an array of int and if so int takes 2 or 4 bytes storage size so we reach that is
3 char long plus the NULL byte ('\0') we get 3 * 2 bytes + 1 * 2 bytes = 8 bytes .
Am i correct? And why we need to use * to declare it? Does * is not for pointers?
How does this syntax works? */
char *name_ = "vid";
A string constant like "vid" decays into a pointer to its first byte, and when you convert a pointer to a char, the program will truncate the pointer's value to make it fit. Apparently, that happens to produce a number whose ASCII value is D on your machine. You get an initialization makes integer from pointer without a cast warning for that, if you compile with GCC.
sizeof(name_) == sizeof(char*), which is 8 on a machine with 64-bit pointers. sizeof("vid") == 4, per definition: sizeof measures size in char units.
In the first exemple name = "vid" you are not assigning the string "vid" to name, by convension, a string constant is a pointer to it's first element, so in the first statment you're assigning the address of "vid". Like others said by accident the number stored in name after the address gets fitted to 1 bytes was the ascii code of 'D'. But if you turn on warnings you will get an error message telling you that your tring to assigne make a char from char * which is not compatible as char can hold only 1 byte.
The second exemple char *name_ = "vid" your assigning the address of "vid" to name_ which is right as it is a pointer to char.
Note that you are not storing the string "vid" in name_. The string constant "vid" is stored somewhere in a read only memory and the address of the first element of that string constant is assigned to name_.
For your first example, I am not sure how that compiles. You are attempting to assign an array of characters to a single character. This shouldn't be allowed without some kind of warning.
For the second, you are taking sizeof a char*, which is a pointer. Anytime you add the * to a type, you make it a pointer. In your case, this is 8 bytes, regardless of how much data it is pointing to. If you want to know the size of the data and not the size of the pointer, then you'd need to do the following;
sizeof(name_[0]) * 4
or
sizeof(char) * 4
Since your array is 3 characters, +1 for the null character, making it 4 characters long. This takes the size of the first element (a single character, 1 byte) and multiplies it by the length of the string. Thus, your data size should be 4 bytes.
Your first string of code declares name as a pointer to an array of four characters placed in static data segment. So when you are treating name as a character you get the last byte of the pointer: 0x??????????????64 where '??' are unknown bytes.
About the second string, you're getting sizeof of the pointer. In 64-bit systems pointers are 64-bit or 8-byte. It is what you get.

size of character array and size of character pointer

I have a piece of C code and I don't understand how the sizeof(...) function works:
#include <stdio.h>
int main(){
const char firstname[] = "bobby";
const char* lastname = "eraserhead";
printf("%lu\n", sizeof(firstname) + sizeof(lastname));
return 0;
}
In the above code sizeof(firstname) is 6 and sizeof(lastname) is 8.
But bobby is 5 characters wide and eraserhead is 11 wide. I expect 16.
Why is sizeof behaving differently for the character array and pointer to character?
Can any one clarify?
firstname is a char array carrying a trailing 0-terminator. lastname is a pointer. On a 64bit system pointers are 8 byte wide.
sizeof an array is the size of the total array, in the case of "bobby", it's 5 characters and one trailing \0 which equals 6.
sizeof a pointer is the size of the pointer, which is normally 4 bytes in 32-bit machine and 8 bytes in 64-bit machine.
The size of your first array is the size of bobby\0. \0 is the terminator character, so it is 6.
The second size is the size of a pointer, which is 8 byte in your 64bit system. Its size doesn't depends on the assigned string's length.
how the sizeof(...) function works
sizeof() looks like a function but it's not a function. A function computes something at run-time.
sizeof() asks the compiler, at compile-time, how much memory it allocates for the argument. BTW sizeof() has no idea how much of it you actually use later at run time. In other words, you've "hardcoded" the printf arguments in your example.
Why is sizeof behaving differently for the character array and pointer
to character?
A pointer rarely requires the same amount of memory as an array.
In general, the amount of memory allocated for a pointer is different to what is allocated for its pointee.
firstname is an array of 6 chars, including the terminating '\0' character at the end of the string. That's why sizeof firstname is 6.
lastname is a pointer to char, and will have whatever size such a pointer has on your system. Typical values are 4 and 8. The size of lastname will be the same no matter what it is pointing to (or even if it is pointing to nothing at all).
firstname[] is null-terminated, which adds 1 to the length.
sizeof(lastname) is giving the size of the pointer instead of the actual value.

Trying to Wrap My Head Around String Sizes in C

A friend and I are doing a C programming unit for college.
We understand that there is no "string" per se in C, and instead, a string is defined by being an array of characters. Awesome!
So when dealing with "strings" is obvious that a proper understanding arrays and pointers is important.
We were doing really well understanding pointer declaration, when and when not to dereference the pointer, and played around with a number of printf's to test our experiments. All with great success.
However, when we used this:
char *myvar = "";
myvar = "dhjfejfdhdkjfhdjkfhdjkfhdjfhdfhdjhdsjfkdhjdfhddskjdkljdklc";
printf("Size is %d\n", sizeof(myvar));
and it spits out Size is 8!
Why 8? Clearly there are more than 8 bytes being consumed by 'myvar' (or is it)?
(I should be clear and point out that I am VERY aware of 'strlen'. This is not an exercise in getting the length of a string. This is about trying to understand why sizeof returns 8 bytes for the variable myvar.)
8 is the size of the pointer.
myvar is a pointer to char (hence char*) and in 64 bit system pointers are 64 bit = 8 byte
To get size of a null-terminated string use this code :
#include<string.h>
#include<stdio.h>
int main()
{
char *x="hello there";
printf("%d\n",strlen(x));
return 0;
}
Well as AbiusX said, the reason why sizeof is returning 8 is because you are finding the size of a pointer (and I'm guessing you're on a 64-bit machine). For example, that same code-snippet would return 4 on my machine.
Strings in C are kept as an array of characters followed by a null terminator. So when you do this...
const char *message = "hello, world!"
It's actually stored in memory as:
'h''e''l''l''o'','' ''w''o''r''l''d''!''\0'...garbage here
If you read past the null terminator, you'll likely just find whatever garbage happens to be in memory there at the time. So in order to find the length of a string in C, you need to start at the beginning of the string and read until the null terminator.
size_t count = 0;
const char *message = "hello, world!";
for ( ; message[count] != '\0'; count++ );
printf("size of message %u\n", count);
Now this is an O(n) operation (because you have to iterate over the entire array to get the size). Most higher level languages have their upper level abstraction of strings as something similar to...
struct string {
char *c_str;
size_t length;
};
And then they just keep track of how long the string is whenever they do an operation on it. This greatly speeds up finding the length of a string, which is a very common operation.
Now there is one way you can figure out the length of a string using sizeof, but I don't suggest it. Using sizeof on an array (not a pointer!) will return the size of the array multiplied by the data type size. And C can auto-figure out the size of an array as long as it can be figured out at compile-time.
const char message[] = "hello, world!";
printf("size of message %u\n", sizeof(message));
That will print the correct size of the message. Remember, this is NOT suggested. Notice that this will print one greater than the number of characters in the string. That's because it also counts the null terminator (as it has to allocate an array large enough to have the null terminator). So it's not really the real length of the string (you can always just subtract one).
myvar is a pointer. You seem to be on a 64-bit machine, so sizeof returns 8 byte in size. What you're probably looking for instead is strlen().
Like AbiusX said, 8 is the size of the pointer. strlen can tell you the length of the string (man page).

Resources