Trying to Wrap My Head Around String Sizes in C - c

A friend and I are doing a C programming unit for college.
We understand that there is no "string" per se in C, and instead, a string is defined by being an array of characters. Awesome!
So when dealing with "strings" is obvious that a proper understanding arrays and pointers is important.
We were doing really well understanding pointer declaration, when and when not to dereference the pointer, and played around with a number of printf's to test our experiments. All with great success.
However, when we used this:
char *myvar = "";
myvar = "dhjfejfdhdkjfhdjkfhdjkfhdjfhdfhdjhdsjfkdhjdfhddskjdkljdklc";
printf("Size is %d\n", sizeof(myvar));
and it spits out Size is 8!
Why 8? Clearly there are more than 8 bytes being consumed by 'myvar' (or is it)?
(I should be clear and point out that I am VERY aware of 'strlen'. This is not an exercise in getting the length of a string. This is about trying to understand why sizeof returns 8 bytes for the variable myvar.)

8 is the size of the pointer.
myvar is a pointer to char (hence char*) and in 64 bit system pointers are 64 bit = 8 byte
To get size of a null-terminated string use this code :
#include<string.h>
#include<stdio.h>
int main()
{
char *x="hello there";
printf("%d\n",strlen(x));
return 0;
}

Well as AbiusX said, the reason why sizeof is returning 8 is because you are finding the size of a pointer (and I'm guessing you're on a 64-bit machine). For example, that same code-snippet would return 4 on my machine.
Strings in C are kept as an array of characters followed by a null terminator. So when you do this...
const char *message = "hello, world!"
It's actually stored in memory as:
'h''e''l''l''o'','' ''w''o''r''l''d''!''\0'...garbage here
If you read past the null terminator, you'll likely just find whatever garbage happens to be in memory there at the time. So in order to find the length of a string in C, you need to start at the beginning of the string and read until the null terminator.
size_t count = 0;
const char *message = "hello, world!";
for ( ; message[count] != '\0'; count++ );
printf("size of message %u\n", count);
Now this is an O(n) operation (because you have to iterate over the entire array to get the size). Most higher level languages have their upper level abstraction of strings as something similar to...
struct string {
char *c_str;
size_t length;
};
And then they just keep track of how long the string is whenever they do an operation on it. This greatly speeds up finding the length of a string, which is a very common operation.
Now there is one way you can figure out the length of a string using sizeof, but I don't suggest it. Using sizeof on an array (not a pointer!) will return the size of the array multiplied by the data type size. And C can auto-figure out the size of an array as long as it can be figured out at compile-time.
const char message[] = "hello, world!";
printf("size of message %u\n", sizeof(message));
That will print the correct size of the message. Remember, this is NOT suggested. Notice that this will print one greater than the number of characters in the string. That's because it also counts the null terminator (as it has to allocate an array large enough to have the null terminator). So it's not really the real length of the string (you can always just subtract one).

myvar is a pointer. You seem to be on a 64-bit machine, so sizeof returns 8 byte in size. What you're probably looking for instead is strlen().

Like AbiusX said, 8 is the size of the pointer. strlen can tell you the length of the string (man page).

Related

sizeof for a string in array of strings

I'm trying to switch from python to c for sometime, I was just checking out a few functions, what caught my attention is sizeof operator which returns the size of object in bytes. I created an array of strings, and would want to find the size of the array. I know that it can be done by sizeof(array)/sizeof(array[0]). However, I find this a bit confusing.
I expect that large array would be 2D (which is just 1D array represented differently) and each character array within this large array would occupy as many bytes as the maximum size of character array within this large array. Example below
#include <stdio.h>
#include <string.h>
const char *words[] = {"this","that","Indian","he","she","sometimes","watch","now","browser","whatsapp","google","telegram","cp","python","cpp","vim","emacs","jupyter","space","earphones","laptop","charger","whiteboard","chalk","marker","matrix","theory","optimization","gradient","descent","numpy","sklearn","pandas","torch","array"};
const int length = sizeof(words)/sizeof(words[0]);
int main()
{
printf("%s",words[1]);
printf("%i",length);
printf("\n%lu",sizeof(words[0]));
printf("\n%lu %lu %s",sizeof(words[27]),strlen(words[27]),words[27]);
return 0;
}
[OUT]
that35
8
8 12 optimization
each of the character arrays occupy 8 bytes, including the character array "optimization". I don't understand what is going on here, the strlen function gives expected output since it just find NULL character in the character array, I'd expected the output of sizeof operator to be 1 more than the output of strlen.
PS: I didn't find some resource that addresses this issue.
It's happening because sizeof(words[27]) is giving the size of a pointer and words[27] is a pointer, and pointers have a fixed size of each machine, mostly 8 bytes on a x86_64 architecture CPU. Also, words is an array of pointers.
each of the character arrays occupy 8 bytes, including the character array "optimization".
No, each word in words is occupying a fixed memory (their length), 8 bytes is the size of pointer which is unsigned long int, it stores the address of the word in words.
const int length = sizeof(words)/sizeof(words[0]);
The above line gives 35 because words is not decayed as a pointer, it is stored in the program's data section, because it's a global variable.
Read More about pointer decaying:
https://www.geeksforgeeks.org/what-is-array-decay-in-c-how-can-it-be-prevented/
https://www.opensourceforu.com/2016/09/decayintopointers/
words is an array of pointer to const char, statically initialized like this diagram:
In practice, the words will probably point to multiple entries from read-only-data. To use words in this manner, it is totally appropriate to use strlen.

Store integer values as a string in C

Following code print integer values:
for ( i=0 ; i<COL ; i++ )
{
fprintf(iOutFile,"%02x ",(int)(iPtr[offset]));
}
I want to store these integer values as a string in a character pointer. To do so, I tried following code but it does not work.
char *hexVal="";
char *temp;
int val;
for ( i=0 ; i<COL ; i++ )
{
fprintf(iOutFile,"%02x ",(int)(iPtr[offset]));
val = (int)(iPtr[offset]);
temp=(char*) val;
hexVal = strcat(hexVal,temp);
}
printf("%s", hexVal);
Thanx.......
When you write
char* hexVal = "";
you are setting hexVal to point to a string literal, later in your code you try to strcat to that address which will cause undefined behavior.
What you need to do is to allocate a large enough area to hold your resulting string and then let hexVal point to that.
E.g.
char* hexVal = malloc(100); // or how much bytes you need
then just do
strcat(hexVal, temp);
alt. allocate on stack
char hexVal[100];
You are approaching this wrong. In general, if you have an int i then you can't just typecast a char *cp to the address (or the value) of i and expect it to magically become a string that you can printf or use in strcat. For one thing, strings are null-terminated and don't have a fixed length, while ints have a fixed size of typically 32 bits long.
You have to create a separate buffer (memory space) where snprintf will happily create a string-representation of your int value for you.
I think that your question is more about understanding how programming, pointers and C work in general, than about ints and strings and their conversion.
You are getting undefined behavior since there is no writable memory at hexVal, which just points at a read-only area containing a character with the value 0. It is a valid string, but it's constant and has length 0, you cannot append to it.
Also, of course you can't cast an integer into a "string", that's just not how C works. This:
temp=(char*) val;
simply re-interprets the value of val as a pointer (i.e. as an address), it doesn't magically compute the proper sequence of digit characters used to represent the address, i.e. it doesn't convert val to a string.
You can use snprintf() to convert an integer to a string.
So, to summarize:
Change hexval's declaration into e.g. char hexval[32] = "";, this declares it as an array of 32 characters, giving you plenty of space into which to build a number as a string. It also initializes the array so the first character is 0, thus making it into an empty string (with space to grow).
Use e.g. snprintf(tmp, sizeof tmp, "%d", 4711); to format a number into a string (which should be e.g. char tmp[16];) in decimal form.
Use strcat(hexval, tmp); to concatenate the newly built numeric string onto hexval.
BEWARE that you can't concatenate forever, you will run out of space if you do it too long. Adjust the sizes, in that case.
Check return values where possible, read the manual pages (Google man FUNCTION where FUNCTION is a standard C library function like strcat).

allocating memory to pointer with exact character length

I am new to c and learning pointers at the moment what I know is that pointer points to the memory address of whatever it points to.
my question is this how you allocates memory exactly the length of the character or it will take 50 bytes?
Lets say they entered a title: hunger games
BOOL AddNewDVD(Database* data){
}
I am new to c and learning pointers
Pointers are tough for beginners. Make sure you get a solid foundation.
at the moment what I know is that pointer points to the memory address of whatever it points to.
Though that is in practice correct, that's not how I like to think of it. What you are describing is how pointers are typically implemented, not what they are conceptually. By confusing the implementation with the concept you set yourself up for writing bad code later that makes unwarranted assumptions. There is no requirement that a pointer be a number which is an address in a virtual memory system.
A better way to think of a pointer is not as an address, but rather:
A pointer to t is a value.
Applying the * operator to a pointer to t gives you a variable of type t.
Applying the & operator to a variable of type t gives you a pointer to t.
A variable of type t can fetch or store a value of type t.
An array is a set of variables each identified by an index.
If a pointer references the variable associated with index i in an array then p + x gives you a pointer that references the variable associated with index i + x.
Applying the [i] operator to a pointer is a shorthand for *(p+i).
That is, rather than thinking of a pointer as a number that refers to a location in memory, just think of it as something that you can force to give you a variable.
is this how you allocates memory exactly the length of the scanned string or it will take 50 bytes?
char *title = malloc(50 * sizeof(char));
scanf(" %[^\n]s", title);
malloc(50*sizeof(char)) gives you an array of 50 chars.
title is a pointer to char.
When dereferenced, title will give you the variable associated with the first item in the array. (Item zero; remember, the index is the distance from the first item, and the first item has zero distance from the first item.)
scanf fills in the characters typed by the user into your array of 50 chars.
If they type in more than 49 chars (remembering that there will be a zero char placed at the end by convention) then arbitrarily bad things can happen.
As you correctly note, either you are wasting a lot of space or you are possibly overflowing the buffer. The solution is: don't use scanf for any production code. It is far too dangerous. Instead use fgets. See this question for more details:
How to use sscanf correctly and safely
You need to have a buffer to know how long the entered name is. This is your title, which can be filled maximum with 49 chars. Then you compute len and see it is only 6 byte long. You allocate string to have exactely this size + 1.
Of course you can then write the content of title to string, even if title is a 50 byte long buffer, and string only 7 byte long - copying of the content ends with the \0 termination char, and this is guaranteed to be inside capacity of string.
You cannot use scanf to determine the length of a string and then allocate memory for it. You need to either:
Ask the user the length of the string. Obviously, this is a poor choice.
Create a static buffer that is more than large enough and then create a dynamic string that is the exact length you need. The problem is, determining what the maximum length the string may be. fgets might be what you need. Consider the following code fragment:
#define MAX_STR_LEN (50)
char buf[MAX_STR_LEN] = {0};
char *str, *cPtr;
/* Get User Input */
printf("Enter a string, no longer than %d characters: ", MAX_STR_LEN);
fgets(buf, MAX_STR_LEN, stdin);
/* Remove Newline Character If Present */
cPtr = strstr(buf, "\n");
if(cPtr)
*cPtr = '\0';
/* Allocate Memory For Exact Length Of String */
str = malloc(strlen(buf) + 1);
strncpy(str, buf, strlen(buf));
/* Display Result */
printf("Your string is \"%s\"\n", str);

How to get the string size in bytes?

As the title implies, my question is how to get the size of a string in C. Is it good to use sizeof if I've declared it (the string) in a function without malloc in it? Or, if I've declared it as a pointer? What if I initialized it with malloc? I would like to have an exhaustive response.
You can use strlen. Size is determined by the terminating null-character, so passed string should be valid.
If you want to get size of memory buffer, that contains your string, and you have pointer to it:
If it is dynamic array(created with malloc), it is impossible to get
it size, since compiler doesn't know what pointer is pointing at.
(check this)
If it is static array, you can use sizeof to get its size.
If you are confused about difference between dynamic and static arrays, check this.
Use strlen to get the length of a null-terminated string.
sizeof returns the length of the array not the string. If it's a pointer (char *s), not an array (char s[]), it won't work, since it will return the size of the pointer (usually 4 bytes on 32-bit systems). I believe an array will be passed or returned as a pointer, so you'd lose the ability to use sizeof to check the size of the array.
So, only if the string spans the entire array (e.g. char s[] = "stuff"), would using sizeof for a statically defined array return what you want (and be faster as it wouldn't need to loop through to find the null-terminator) (if the last character is a null-terminator, you will need to subtract 1). If it doesn't span the entire array, it won't return what you want.
An alternative to all this is actually storing the size of the string.
While sizeof works for this specific type of string:
char str[] = "content";
int charcount = sizeof str - 1; // -1 to exclude terminating '\0'
It does not work if str is pointer (sizeof returns size of pointer, usually 4 or 8) or array with specified length (sizeof will return the byte count matching specified length, which for char type are same).
Just use strlen().
If you use sizeof()then a char *str and char str[] will return different answers. char str[] will return the length of the string(including the string terminator) while char *str will return the size of the pointer(differs as per compiler).
I like to use:
(strlen(string) + 1 ) * sizeof(char)
This will give you the buffer size in bytes. You can use this with snprintf() may help:
const char* message = "%s, World!";
char* string = (char*)malloc((strlen(message)+1))*sizeof(char));
snprintf(string, (strlen(message)+1))*sizeof(char), message, "Hello");
Cheers! Function: size_t strlen (const char *s)
There are two ways of finding the string size bytes:
1st Solution:
# include <iostream>
# include <cctype>
# include <cstring>
using namespace std;
int main()
{
char str[] = {"A lonely day."};
cout<<"The string bytes for str[] is: "<<strlen(str);
return 0;
}
2nd Solution:
# include <iostream>
# include <cstring>
using namespace std;
int main()
{
char str[] = {"A lonely day."};
cout<<"The string bytes for str[] is: "<<sizeof(str);
return 0;
}
Both solution produces different outputs. I will explain it to you after you read these.
The 1st solution uses strlen and based on cplusplus.com,
The length of a C string is determined by the terminating null-character: A C string is as long as the number of characters between the beginning of the string and the terminating null character (without including the terminating null character itself).
That can explain why does the 1st Solution prints out the correct string size bytes when the 2nd Solution prints the wrong string size bytes. But if you still don't understand, then continue reading.
The 2nd Solution uses sizeof to find out the string size bytes. Based on this SO answer, it says (modified it):
sizeof("f") must return 2 string size bytes, one for the 'f' and one for the terminating '\0' (terminating null-character).
That is why the output is string size bytes 14. One for the whole string and one for '\0'.
Conclusion:
To get the correct answer for 2nd Solution, you must do sizeof(str)-1.
References:
Sizeof string literal
https://cplusplus.com/reference/cstring/strlen/?kw=strlen

I'm new to C, can someone explain why the size of this string can change?

I have never really done much C but am starting to play around with it. I am writing little snippets like the one below to try to understand the usage and behaviour of key constructs/functions in C. The one below I wrote trying to understand the difference between char* string and char string[] and how then lengths of strings work. Furthermore I wanted to see if sprintf could be used to concatenate two strings and set it into a third string.
What I discovered was that the third string I was using to store the concatenation of the other two had to be set with char string[] syntax or the binary would die with SIGSEGV (Address boundary error). Setting it using the array syntax required a size so I initially started by setting it to the combined size of the other two strings. This seemed to let me perform the concatenation well enough.
Out of curiosity, though, I tried expanding the "concatenated" string to be longer than the size I had allocated. Much to my surprise, it still worked and the string size increased and could be printf'd fine.
My question is: Why does this happen, is it invalid or have risks/drawbacks? Furthermore, why is char str3[length3] valid but char str3[7] causes "SIGABRT (Abort)" when sprintf line tries to execute?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main() {
char* str1 = "Sup";
char* str2 = "Dood";
int length1 = strlen(str1);
int length2 = strlen(str2);
int length3 = length1 + length2;
char str3[length3];
//char str3[7];
printf("%s (length %d)\n", str1, length1); // Sup (length 3)
printf("%s (length %d)\n", str2, length2); // Dood (length 4)
printf("total length: %d\n", length3); // total length: 7
printf("str3 length: %d\n", (int)strlen(str3)); // str3 length: 6
sprintf(str3, "%s<-------------------->%s", str1, str2);
printf("%s\n", str3); // Sup<-------------------->Dood
printf("str3 length after sprintf: %d\n", // str3 length after sprintf: 29
(int)strlen(str3));
}
This line is wrong:
char str3[length3];
You're not taking the terminating zero into account. It should be:
char str3[length3+1];
You're also trying to get the length of str3, while it hasn't been set yet.
In addition, this line:
sprintf(str3, "%s<-------------------->%s", str1, str2);
will overflow the buffer you allocated for str3. Make sure you allocate enough space to hold the complete string, including the terminating zero.
void main() {
char* str1 = "Sup"; // a pointer to the statically allocated sequence of characters {'S', 'u', 'p', '\0' }
char* str2 = "Dood"; // a pointer to the statically allocated sequence of characters {'D', 'o', 'o', 'd', '\0' }
int length1 = strlen(str1); // the length of str1 without the terminating \0 == 3
int length2 = strlen(str2); // the length of str2 without the terminating \0 == 4
int length3 = length1 + length2;
char str3[length3]; // declare an array of7 characters, uninitialized
So far so good. Now:
printf("str3 length: %d\n", (int)strlen(str3)); // What is the length of str3? str3 is uninitialized!
C is a primitive language. It doesn't have strings. What it does have is arrays and pointers. A string is a convention, not a datatype. By convention, people agree that "an array of chars is a string, and the string ends at the first null character". All the C string functions follow this convention, but it is a convention. It is simply assumed that you follow it, or the string functions will break.
So str3 is not a 7-character string. It is an array of 7 characters. If you pass it to a function which expects a string, then that function will look for a '\0' to find the end of the string. str3 was never initialized, so it contains random garbage. In your case, apparently, there was a '\0' after the 6th character so strlen returns 6, but that's not guaranteed. If it hadn't been there, then it would have read past the end of the array.
sprintf(str3, "%s<-------------------->%s", str1, str2);
And here it goes wrong again. You are trying to copy the string "Sup<-------------------->Dood\0" into an array of 7 characters. That won't fit. Of course the C function doesn't know this, it just copies past the end of the array. Undefined behavior, and will probably crash.
printf("%s\n", str3); // Sup<-------------------->Dood
And here you try to print the string stored at str3. printf is a string function. It doesn't care (or know) about the size of your array. It is given a string, and, like all other string functions, determines the length of the string by looking for a '\0'.
Instead of trying to learn C by trial and error, I suggest that you go to your local bookshop and buy an "introduction to C programming" book. You'll end up knowing the language a lot better that way.
There is nothing more dangerous than a programmer who half understands C!
What you have to understand is that C doesn't actually have strings, it has character arrays. Moreover, the character arrays don't have associated length information -- instead, string length is determined by iterating over the characters until a null byte is encountered. This implies, that every char array should be at least strlen + 1 characters in length.
C doesn't perform array bounds checking. This means that the functions you call blindly trust you to have allocated enough space for your strings. When that isn't the case, you may end up writing beyond the bounds of the memory you allocated for your string. For a stack allocated char array, you'll overwrite the values of local variables. For heap-allocated char arrays, you may write beyond the memory area of your application. In either case, the best case is you'll error out immediately, and the worst case is that things appear to be working, but actually aren't.
As for the assignment, you can't write something like this:
char *str;
sprintf(str, ...);
and expect it to work -- str is an uninitialized pointer, so the value is "not defined", which in practice means "garbage". Pointers are memory addresses, so an attempt to write to an uninitialized pointer is an attempt to write to a random memory location. Not a good idea. Instead, what you want to do is something like:
char *str = malloc(sizeof(char) * (string length + 1));
which allocates n+1 characters worth of storage and stores the pointer to that storage in str. Of course, to be safe, you should check whether or not malloc returns null. And when you're done, you need to call free(str).
The reason your code works with the array syntax is because the array, being a local variable, is automatically allocated, so there's actually a free slice of memory there. That's (usually) not the case with an uninitialized pointer.
As for the question of how the size of a string can change, once you understand the bit about null bytes, it becomes obvious: all you need to do to change the size of a string is futz with the null byte. For example:
char str[] = "Foo bar";
str[1] = (char)0; // I'd use the character literal, but this editor won't let me
At this point, the length of the string as reported by strlen will be exactly 1. Or:
char str[] = "Foo bar";
str[7] = '!';
after which strlen will probably crash, because it will keep trying to read more bytes from beyond the array boundary. It might encounter a null byte and then stop (and of course, return the wrong string length), or it might crash.
I've written all of one C program, so expect this answer to be inaccurate and incomplete in a number of ways, which will undoubtedly be pointed out in the comments. ;-)
Your str3 is too short - you need to add extra byte for null-terminator and the length of "<-------------------->" string literal.
Out of curiosity, though, I tried
expanding the "concatenated" string to
be longer than the size I had
allocated. Much to my surprise, it
still worked and the string size
increased and could be printf'd fine.
The behaviour is undefined so it may or may not segfault.
strlen returns the length of the string without the trailing NULL byte (\0, 0x00) but when you create a variable to hold the combined strings you need to add that 1 character.
char str3[length3 + 1];
…and you should be all set.
C strings are '\0' terminated and require an extra byte for that, so at least you should do
char str3[length3 + 1]
will do the job.
In sprintf() ypu are writing beyond the space allocated for str3. This may cause any type of undefined behavior (If you are lucky then it will crash). In strlen(), it is just searching for a NULL character from the memory location you specified and it is finding one in 29th location. It can as well be 129 also i.e. it will behave very erratically.
A few important points:
Just because it works doesn't mean it's safe. Going past the end of a buffer is always unsafe, and even if it works on your computer, it may fail under a different OS, different compiler, or even a second run.
I suggest you think of a char array as a container and a string as an object that is stored inside the container. In this case, the container must be 1 character longer than the object it holds, since a "null character" is required to indicate the end of the object. The container is a fixed size, and the object can change size (by moving the null character).
The first null character in the array indicates the end of the string. The remainder of the array is unused.
You can store different things in a char array (such as a sequence of numbers). It just depends on how you use it. But string function such as printf() or strcat() assume that there is a null-terminated string to be found there.

Resources