difference between sizeof and strlen in C linux - c

The first printf statement is giving output 3 and second giving 20.
Can anybody please explain what's the difference between the two here?
char frame[20],str[20];
printf("\nstrlen(frame)= %d",strlen(frame));
printf("\nsizeof(frame) = %d",sizeof(frame));
Thanks :)

sizeof is a compile-time operator and determines the size in bytes that a type consumes. In the case of frame (char[20]) that is 20 bytes.
strlen is a run-time function and scans a given pointer until the first occurrence of a nul terminator '\0' returning the amount of characters until then.

Because the contents of frame is not initialized, which means it is not a valid C string, so strlen(frame) could return any value, or crash. Actually, its behavior is undefined in this case.
Because frame is an array of 20 characters, therefore sizeof(frame) will return 20 * sizeof(char), which will always be 20 (sizeof(char) always equals 1).

strlen actually gives you the length of the string, whereas sizeof gives you the size of the allocated memory in bytes. It is infact quite nicely explained here http://www.cplusplus.com/reference/cstring/strlen/ Extract given below.
The length of a C string is determined by the terminating null-character: A C string is as long as the number of characters between the beginning of the string and the terminating null character (without including the terminating null character itself).
This should not be confused with the size of the array that holds the string. For example:
char mystr[100]="test string";
defines an array of characters with a size of 100 chars, but the C string with which mystr has been initialized has a length of only 11 characters. Therefore, while sizeof(mystr) evaluates to 100, strlen(mystr) returns 11.
And yes as per the other comments, you are trying to get length for uninitialized strings and that leads to undefined behaviour, it can be 3 or anything else, depending on whatever garbage is present in the memory that got allocated for your string.

Related

C - strcpy() function restrictions

I am incredibly new in C programming and I'm having a hard time understanding some aspects of it, including the strcpy() function.
I am doing some quizzes and passed over the following question:
To assure the correctness of the following strcpy(d,s) call, which of the following conditions must always be met:
a. sizeof(d) >= strlen(s) + 1
b. sizeof(d) >= sizeof(s)
c. sizeof(d) >= strlen(s)
d. strlen(d) >= strlen(s)
e. strlen(d) >= strlen(s) + 1
After doing some research, I found that the size of the destination string should be large enough to store the copied string. Source here. This led me to answers either b or d.
However the correct answer is 'a' and I cannot understand why, and cannot find any documentation. Could someone please explain in more details what the restrictions of strcpy() are?
It sort of depends on how the variables are declared and/or defined even, however judging by the fact that the answer is a, I'm positive that this is how they were declared and defined-
// Assume SIZE_0 and SIZE_1 are some integer values
char d[SIZE_0];
char s[SIZE_1];
// Assign a bunch of characters to `s` here and null terminate it
// Assume `s` now has `LEN` number of characters + 1 for the null terminator, for a total of `LEN + 1`
// Of course, `LEN + 1` is either less than, or equal to, `SIZE_1`
Now let's get the values cleared up real quick-
strlen(s) -> Returns LEN, as it counts the number of characters until the null terminator
sizeof(s) -> Returns SIZE_1
sizeof(d) -> Returns SIZE_0
strlen(d) -> Doesn't work as you'd think, strlen won't work without a null terminator, currently d has no value - so there's no length, not even 0 would count unless you set d[0] to '\0' by yourself
So, it's evident that for strcpy(d, s) to work, sizeof(d) (which is the only valid call) MUST BE more than, or equal to, LEN + 1 (LEN for all the characters from s and +1 for the null terminator).
Of course, strlen on s will return LEN, so we'll need strlen(s) + 1.
And that is why you need sizeof(d) to be more than, or equal to, strlen(s) + 1.
I must say though, if strlen did return the capacity (aka size) of the string, instead of the current length, your assumption would work.
At the end of the day, always remember the difference between capacity (size) and length, especially in C.
It is copying a sequence of character bytes with a null terminator. For example the string "ABC" is represented 65,66,67,0.
strlen calculates the length of the string excluding the null terminator (e.g. 3 for "ABC"). sizeof (for an array) gives you the amount of memory set aside for the array. The string it contains may be shorter.
char s[20] = "ABC"; would give you a string of length 3, using 4 bytes including the terminator, in a reserved memory space of 20 bytes.
To safely copy from s to d, you need to have enough space in d to receive the string and its terminator (without the terminator you won't have a valid string). Hence strlen(s)+1.
The size of s is irrelevant as not all of the reserved space will be copied. strlen(d) is irrelevant also - any existing string in d will be overwritten.
So option a is correct.
All character strings in C (the kind used by functions such as strcpy) must have a nul-terminator (a zero-value char signalling the end of the string). So, to store a string like "abc", you will need any char[] array to have at least four elements: one for each of the letters plus one for the nul-terminator.
The strlen() function returns the number of characters in the given string not including that nul-terminator; but the strcpy() function copies all characters including the terminator, so the destination buffer must be at least one 'chargreater than thestrlen` of the source.
Also note that the sizeof(d) calculation will only work if d is declared as an array of char (e.g. char d[42]); if that array is passed to a function, it will 'decay' to a pointer, and the array's size will not be (implicitly) known to that function; see this discussion: How to find the 'sizeof' (a pointer pointing to an array)?.

When using strcpy() does the destination string need to be one element bigger than the source string?

Consider this code snippet (simplified syntax for clarity).
void simple (char *bar) {
char MyArray[12];
strcpy(MyArray, bar);
}
My instructor says that MyArray can copy at most 12 elements from bar, but from what I've read, MyArray can only store 11 characters because it needs room for the null character at the end. So if the received value of bar is 12 or greater, a buffer overflow would occur. My instructor says that this will only happen if the received value of bar
is 13 or greater. Who's right? I'd appreciate if you could cite a credible source so I can convince him.
The definition char MyArray[12] creates an array of 12 char, which can be used to store a string. Since strings in C are null terminated, one of those characters needs to be able to store the null byte at the end of the string.
So a variable of type char [12] can hold a string of at most 11 characters. Attempting to copy a string of length 12 or longer using strcpy as in your example will overflow the bounds of the array.
If you were to use strncpy as follows:
strncpy(MyArray, bar, 12);
This will not overflow the buffer, as it would copy at most 12 characters. However, if 12 characters are copied, that means the string is not null terminated and is therefore not technically a string. Then attempting to use any other string function on MyArray that expect a null terminated string would read off the end of the array.
The/a proper use of strncpy would be:
void simple(char *bar) {
char MyArray[12];
strncpy(MyArray, bar, sizeof(MyArray)-1);
MyArray[sizeof(MyArray)-1]= '\0';
}
This just puts in a terminating null character, whether strncpy was able to do that or not.
It's hard to tell, because your question is a bit confusingly worded, but I think you're right, and that your instructor is wrong.
Given the code
void simple(char *bar) {
char MyArray[12];
strcpy(MyArray, bar);
}
if the passed-in bar points to a string of 11 or fewer characters, a valid string will be copied to MyArray, with no buffer overflow. But if the string is 12 (or more) characters long, you're right, there'll be a buffer overflow, because strcpy will also copy the 13th, terminating null character.
Earlier you asked about strncpy. Given the code
void simple2(char *bar) {
char MyArray[12];
strncpy(MyArray, bar, 12);
}
if the passed-in bar points to a string of 11 or fewer characters, a valid string will be copied to MyArray. But if the string is 12 characters long, we have a different problem. strncpy will copy 12 characters and stop, meaning that it won't copy the terminating null character. There won't be a buffer overflow, but MyArray still won't end up containing a valid string.
Also, you asked for a credible source. I wrote the C FAQ list -- would you consider that credible? :-)
My instructor says that MyArray can copy at most 12 elements from bar
It will be more correctly to say that the array MyArray may accomodate at most 12 elements of the array bar. Otherwise there will be an attempt to access memory beyond the array.
So in fact your instructor is right.
The array MyArray is declared having only 12 elements
char MyArray[12];
but from what I've read, MyArray can only store 11 characters because
it needs room for the null character at the end
The terminating zero is also a character. And the function strcpy copies all characters from the source string including the terminating zero that is present in the source string.
So if the received value of bar is 12 or greater, a buffer overflow
would occur
What does mean the magic number 12 in this context? is it the number of characters in the array bar or it is the length of the string stored in the array bar (that used as an argument is converted to pointer to its first element)?
If the number 12 means the size of the string stored in the array bar then the function strcpy will try to copy all characters of the array including the terminating zero and in this case the array MyArray has to be declared as having 13 elements.
char MyArray[13];
However if the number 12 means the number of elements in the array bar (used as an argument of the function) and it contains a string then the length of the string is evidently is less than 12. So the array MyArray can accept all characters of the source array including the terminating zero.
So the reason for the confusion is that you did not make a common conclusion what the number 12 means whether it is the length of the source string or it is the size of the source array.
In the first case there will be indeed undefined behavior.
In the second case if the source array contains a string then the code will be well-formed.
A char array and string are similar, but not the same.
In C,
A string is a contiguous sequence of characters terminated by and including the first null character. C11dr §7.1.1 1
void simple (char *bar) {
char MyArray[12];
strcpy(MyArray, bar);
}
My instructor says that MyArray can copy at most 12 elements from bar,
This is correct: MyArray[] can receive up to 12 characters.
strcpy() copies the memory, starting at bar to the array MyArray[] and continues until it copies a null character. If more that 12 characters (the count of 12 includes the null character) are attempted to be copied, the result is undefined behavior (UB).
MyArray can only store 11 characters
Not quite. MyArray[] can store 12 characters. To treat that data as a string, a null character must be one of those 12. When interpreted as a string, the string include all the characters up to the null character. It also include the null chracter. Each element of MyArray[] could be an 'x', but then that memory would not be a string as it lacks a null character.
So if the received value of bar is 12 or greater, a buffer overflow would occur.
Not quite. If the strcpy() attempts to write outside MyArray[], the result is undefined. Buffer overflow may occur. The program may stop, etc. The result is not defined. It is undefined behavior.
My instructor says that this will only happen if the received value of bar is 13 or greater.
bar is a pointer - it likely does not have a "value of 13". bar likely points to memory that is a string. A string includes its terminating null character, so the string may consists of 12 non-null characters and a final null character for a total of 13 characters. MyArray[] is insufficient to store a copy of that string.
Who's right?
I suspect the dis-connect is in the imprecise meaning of "bar is 13"`. I see nothing the reported by the instructor as incorrect.

What happens when I write `char str[80];`?

What happens behind the scenes when I write: char str[80];?
I notice that I can now set str = "hello"; and also str = "hello world"; right afterwards. First time strlen(str) is 5, and second time it is 11;
But why? I thought that after str = "hello";, the char at index 5 becomes null (str[5] becomes '\0'). Doesn't that mean that str's size is now 6 and I shouldn't be able to set it to "hello world"?
And if not, then how does strlen and sizeof calculate the correct values every time?
I think you're getting confused between two different concepts: the allocated length of the array (how much total space is available), and the logical length of the string (how much space is being used).
When you write char str[80], you're getting storage space for 80 characters. You might not end up using all of that space, but regardless of what string you try storing in it, you're always going to have 80 slots into which you can place characters.
If you store the string "hello" into str, then the first six characters of str will be set to h, e, l, l, o, and a null terminating character. This doesn't change the allocated length, though - you still have 74 other slots that you can work with. If you then change it to "hello, world", you're using an extra seven characters, which fits just fine because you easily have enough allocated space to hold things. You've just changed the logical length, how much of that space is being used for meaningful data, but not the allocated length, how much space there is available.
Think of it this way. When you say char str[80], you're buying a plot of land that's, say, 80 acres. If you then put "hello" into it, you're using six acres of that available 80 acres. The rest of the land is still yours - you can build whatever you'd like there - so if you decide to tear everything down and build a longer string that uses up more acres of land, that's fine. No one is going to object.
The strlen function gives back the logical length of the string - how many characters are in the string that you're storing. It works by counting up characters until it finds a null terminator indicating the logical end of the string. The sizeof operator returns the allocated length of the array, how many slots you have. It works at compile-time and doesn't care what the array contents are.
When you declare a variable as char str[80], space for an 80 character array is allocated on the stack. This memory will be automatically released when that particular stack frame is out of scope.
When you assign it to the string literal "hello", it is copying each character into the array, then putting a null terminator at the end of the string (str[5] == '\0'). String length and array size are two different things, which is why you can reassign it to "hello world". String length is simply how many consecutive characters there are before the null terminator. If you instead declared str as char str[5], you would indeed cause a crash when you tried to reassign it to "hello world". It may be helpful to view a simple implementation of strlen:
size_t strlen(const char *str)
{
size_t return_val = 0;
while (str[return_val] != '\0') return_val++;
return return_val;
}
Of course, if there is no null terminating character, the above naive implementation will crash.
I am assuming that you are working in C. When you compile "char str[80];" basically a 80 character long space is allocated for you. sizeof(str) should always tell you that it is an 80 byte long chunk of memory. strlen(str) will count the non-zero characters starting at str[0]. This is why "Hello" is 5 and "Hello world".
I would suggest that you learn to use functions like strnlen, strncpy, strncmp, snprintf ..., this way you can prevent reading/writing beyond the end the array, for example: strnlen(str,sizeof(str)).
Also start working through online tutorials and find an introductory C/C++ book to learn from.
When you declare an array like char str[80]; 80 chars of space are reserved on the stack for you, but they are not initialized - they get whatever was already in memory at the time. It's your job as the programmer to initialize the array.
strlen does something along these lines:
int strlen(char *s)
{
int len = 0;
while(*s++) len++;
return len;
}
In other words, it returns the length of a null-terminated string in a character array, even if the length is less than the size of the total array.
sizeof returns the size of a type or expression. If your array is 80 chars long, and a char is a byte long, it will return 80, even if none of the values in the array have been initialized. If you had an array of 5 ints, and an int was 4 bytes long, sizeof would produce 20.

Does strlen return same value for a binary and ascii data

Please find the below code snippet.
unsigned char bInput[20];
unsigned char cInput[20];
From a function, I get a binary data in bInput and I determined its length using strlen(bInput).
I converted bInput which is in binary to ASCII and stored in cInput and printed its length. But both are different.
I am new to programming. Please guide regarding its behaviour.
Function strlen returns the index of the first character in memory with a value of 0 (AKA '\0'), starting from the memory address indicated by the input argument passed to this function.
If you pass a memory address of "something else" other than a zero-terminated string of characters (which has been properly allocated at that memory address), then there's a fair chance that it will result with a memory-access violation (AKA segmentation fault).
result wont be same for both cases.
Below is one sample scenario:
Null is valid UTF-8, it just doesn't work with C 'strings'.
char temp[8];
buf = "abcde\0f";
What we have here is a buffer of length 8, which contains these char values:
97 98 99 100 101 0 102 0
here,strlen(temp) is equal to 5 as per strlen design,however,The actual length of the buffer is eight.
strlen() counts each byte untill it reaches NULL character ('\0' that means value of a byte is zero). So if you are getting different length for binary and ascii characters means you need to check the below two points in your conversion logic,
what you are doing if binary value is zero.
whether you are converting any nonzero binary value to zero.

Null termination of char array

Consider following case:
#include<stdio.h>
int main()
{
char A[5];
scanf("%s",A);
printf("%s",A);
}
My question is if char A[5] contains only two characters. Say "ab", then A[0]='a', A[1]='b' and A[2]='\0'.
But if the input is say, "abcde" then where is '\0' in that case. Will A[5] contain '\0'?
If yes, why?
sizeof(A) will always return 5 as answer. Then when the array is full, is there an extra byte reserved for '\0' which sizeof() doesn't count?
If you type more than four characters then the extra characters and the null terminator will be written outside the end of the array, overwriting memory not belonging to the array. This is a buffer overflow.
C does not prevent you from clobbering memory you don't own. This results in undefined behavior. Your program could do anything—it could crash, it could silently trash other variables and cause confusing behavior, it could be harmless, or anything else. Notice that there's no guarantee that your program will either work reliably or crash reliably. You can't even depend on it crashing immediately.
This is a great example of why scanf("%s") is dangerous and should never be used. It doesn't know about the size of your array which means there is no way to use it safely. Instead, avoid scanf and use something safer, like fgets():
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.
Example:
if (fgets(A, sizeof A, stdin) == NULL) {
/* error reading input */
}
Annoyingly, fgets() will leave a trailing newline character ('\n') at the end of the array. So you may also want code to remove it.
size_t length = strlen(A);
if (A[length - 1] == '\n') {
A[length - 1] = '\0';
}
Ugh. A simple (but broken) scanf("%s") has turned into a 7 line monstrosity. And that's the second lesson of the day: C is not good at I/O and string handling. It can be done, and it can be done safely, but C will kick and scream the whole time.
As already pointed out - you have to define/allocate an array of length N + 1 in order to store N chars correctly. It is possible to limit the amount of characters read by scanf. In your example it would be:
scanf("%4s", A);
in order to read max. 4 chars from stdin.
character arrays in c are merely pointers to blocks of memory. If you tell the compiler to reserve 5 bytes for characters, it does. If you try to put more then 5 bytes in there, it will just overwrite the memory past the 5 bytes you reserved.
That is why c can have serious security implementations. You have to know that you are only going to write 4 characters + a \0. C will let you overwrite memory until the program crashes.
Please don't think of char foo[5] as a string. Think of it as a spot to put 5 bytes. You can store 5 characters in there without a null, but you have to remember you need to do a memcpy(otherCharArray, foo, 5) and not use strcpy. You also have to know that the otherCharArray has enough space for those 5 bytes.
You'll end up with undefined behaviour.
As you say, the size of A will always be 5, so if you read 5 or more chars, scanf will try to write to a memory, that it's not supposed to modify.
And no, there's no reserved space/char for the \0 symbol.
Any string greater than 4 characters in length will cause scanf to write beyond the bounds of the array. The resulting behavior is undefined and, if you're lucky, will cause your program to crash.
If you're wondering why scanf doesn't stop writing strings that are too long to be stored in the array A, it's because there's no way for scanf to know sizeof(A) is 5. When you pass an array as the parameter to a C function, the array decays to a pointer pointing to the first element in the array. So, there's no way to query the size of the array within the function.
In order to limit the number of characters read into the array use
scanf("%4s", A);
There isn't a character that is reserved, so you must be careful not to fill the entire array to the point it can't be null terminated. Char functions rely on the null terminator, and you will get disastrous results from them if you find yourself in the situation you describe.
Much C code that you'll see will use the 'n' derivatives of functions such as strncpy. From that man page you can read:
The strcpy() and strncpy() functions return s1. The stpcpy() and
stpncpy() functions return a
pointer to the terminating `\0' character of s1. If stpncpy() does not terminate s1 with a NUL
character, it instead returns a pointer to s1[n] (which does not necessarily refer to a valid mem-
ory location.)
strlen also relies on the null character to determine the length of a character buffer. If and when you're missing that character, you will get incorrect results.
the null character is used for the termination of array. it is at the end of the array and shows that the array is end at that point. the array automatically make last character as null character so that the compiler can easily understand that the array is ended.
\0 is an terminator operator which terminates itself when array is full
if array is not full then \0 will be at the end of the array
when you enter a string it will read from the end of the array

Resources