Is null character included while allocating using malloc - c

I have been using C for quite sometime, and I have this trivial problem that I want to query about.
Say i want to create a character array that stores upto 1000 characters. Now, when I am using malloc for the same, then do I specify the size of array as 1001 character [ 1000 characters + null] or just 1000?
Also, say I came across this problem, then how could I have found the answer to this solution on my own, maybe by using some test programs. I understand the size of string is calculated without the null character, but when I am allocating the memory for the same, do I take into account the null character too?

If you need that block for storing null-terminated string then yes, you need to explictly ask malloc() to allocate an extra byte for storing the null-terminator, malloc() will not do it for you otherwise. If you intend to store the string length somewhere else and so you don't need the null terminator you can get away without allocating the extra byte. Of course it's up to you whether you need null-termination for strings, just don't forget that C library string handling functions only work with null-terminated strings.

malloc and family allocate memory in chunks of bytes. So if you do malloc(1000) you get 1000 bytes. malloc will not care if you allocated those 1000 bytes to hold a string or any other data type.
Since strings in C consist of one byte per character and ideally have to be null terminated you need to make sure you have enough memory to hold that. So the answer is: Yes, you need to allocate 1001 bytes if you wish to hold a string of 1000 characters plus null terminator.
Advanced tip: Also keep in mind that depending on how you use it you may or may not need to null terminate a string.
If you for instance know the exact length of your string you can specify that when using it with printf
printf("%*s", length, string);
will print exactly length characters from the buffer pointed at string.

It's up to you to provide the null-terminating character.
malloc allocates memory for you but it doesn't set it to anything.
If you strcpy to the allocated memory then you will have a null-terminator provided for you.
Alternatively, use calloc as it will set all elements to 0, which is in effect the null-terminator. Then if you do, say, memcpy, you wouldn't have to worry about terminating the string properly.

You do indeed need to allocate the memory for the null terminator.
Conceptually the null terminator is just a convenient way of marking the end of a string. The C standard library exploits this convention when modelling a string. For example, strlen computes the length of a string by examining the memory from the input location (probably a char*) until it reaches a null terminator; but the null terminator itself is not included in the length. But it's still part of the memory consumed by the string.

Related

Properties of strcpy()

I have a global definition as following:
#define globalstring "example1"
typedef struct
{
char key[100];
char trail[10][100];
bson_value_t value;
} ObjectInfo;
typedef struct
{
ObjectInfo CurrentOrderInfoSet[5];
} DataPackage;
DataPackage GlobalDataPackage[10];
And I would like to use the strcpy() function in some of my functions as following:
strcpy(GlobalDataPackage[2].CurrentOrderInfoSet[0].key, "example2");
char string[100] = "example3";
strcpy(GlobalDataPackage[2].CurrentOrderInfoSet[0].key, string);
strcpy(GlobalDataPackage[2].CurrentOrderInfoSet[0].key, globalstring);
First question: Are the global defined strings all initiated with 100 times '\0'?
Second qestion: I am a bit confused as to how exactly strcpy() works. Does it only overwrite the characters necessary to place the source string into the destination string plus a \0 at the end and leave the rest as it is or does it fully delete any content of the destination string prior to that?
Third question: All my strings are fixed length of 100. If I use the 3 examples of strcpy() above, with my strings not exceeding 99 characters, does strcpy() properly overwrite the destination string and NULL terminate it? Meaning do I run into problems when using functions like strlen(), printf() later?
Fourth question: What happens when I strcpy() empty strings?
I plan to overwrite these strings in loops various times and would like to know if it would be safer to use memset() to fully "empty" the strings prior to strcpy() on every iteration.
Thx.
Are the global defined strings all initiated with 100 times '\0'?
Yes. Global char arrays will be initilizated to all zeros.
I am a bit confused as to how exactly strcpy() works. Does it only overwrite the characters necessary to place the source string into the destination string plus a \0 at the end and leave the rest as it
Exactly. It copies the characters up until and including '\0' and does not care about the rest.
If I use ... my strings not exceeding 99 characters, does strcpy() properly overwrite the destination string and NULL terminate it?
Yes, but NULL is a pointer, it's terminated with zero byte, sometimes called NUL. You might want to see What is the difference between NUL and NULL? .
Meaning do I run into problems when using functions like strlen(), printf() later?
Not if your string lengths are less than or equal to 99.
What happens when I strcpy() empty strings?
It just copies one zero byte.
would like to know if it would be safer to use memset() to fully "empty" the strings prior to strcpy() on every iteration.
Safety is a broad concept. As far as safety as in if the program will execute properly, there is no point in caring about anything after zero byte, so just strcpy it.
But you should check if your strings are less than 99 characters and handle what to do it they are longer. You might be interested in strnlen, but the interface is confusing - I recommend to use memcpy + explicitly manually set zero byte.

A little query, String in C

Recently I was programming in my Code Blocks and I did a little program only for hobby in C.
char littleString[1];
fflush( stdin );
scanf( "%s", littleString );
printf( "\n%s", littleString);
If I created a string of one character, why does the CodeBlocks allow me to save 13 characters?
C have no bounds-checking, writing out of bounds of arrays or dynamically allocated memory can't be checked by the compiler. Instead it will lead to undefined behavior.
To prevent buffer overflow with scanf you can tell it to only read a specific number of characters, and nothing more. So to tell it to read only one character you use the format "%1s".
As a small side-note: Remember that strings in C have an extra character in them, the terminator (character '\0'). So if you have a string that should contain one character, the size actually needs to be two characters.
LittleString is not a string. It is a char array of length one. In order for a char array to be a string, it must be null terminated with an \0. You are writing past the memory you have allotted for littleString. This is undefined behavior.Scanf just reads user input from the console and assigns it to the variable specified, in this case littleString. If you would like to control the length of user input which is assigned to the variable, I would suggest using scanf_s. Please note that scanf_s is not a C99 standard
Many functions in C is implemented without any checks for correctness of use. In other words, it is the callers responsibility that the arguments fulfill some rules set by the function.
Example: For strcpy the Linux man page says
The strcpy() function copies the string pointed to by src,
including the terminating null byte ('\0'), to the buffer
pointed to by dest. The strings may not overlap, and the
destination string dest must be large enough to receive the copy.
If you as a caller break that contract by passing a too small buffer, you'll have undefined behavior and anything can happen.
The program may crash or even do exactly what you expected in 99 out of 100 times and do something strange in 1 out of 100 times.

Null termination of char array

Consider following case:
#include<stdio.h>
int main()
{
char A[5];
scanf("%s",A);
printf("%s",A);
}
My question is if char A[5] contains only two characters. Say "ab", then A[0]='a', A[1]='b' and A[2]='\0'.
But if the input is say, "abcde" then where is '\0' in that case. Will A[5] contain '\0'?
If yes, why?
sizeof(A) will always return 5 as answer. Then when the array is full, is there an extra byte reserved for '\0' which sizeof() doesn't count?
If you type more than four characters then the extra characters and the null terminator will be written outside the end of the array, overwriting memory not belonging to the array. This is a buffer overflow.
C does not prevent you from clobbering memory you don't own. This results in undefined behavior. Your program could do anything—it could crash, it could silently trash other variables and cause confusing behavior, it could be harmless, or anything else. Notice that there's no guarantee that your program will either work reliably or crash reliably. You can't even depend on it crashing immediately.
This is a great example of why scanf("%s") is dangerous and should never be used. It doesn't know about the size of your array which means there is no way to use it safely. Instead, avoid scanf and use something safer, like fgets():
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.
Example:
if (fgets(A, sizeof A, stdin) == NULL) {
/* error reading input */
}
Annoyingly, fgets() will leave a trailing newline character ('\n') at the end of the array. So you may also want code to remove it.
size_t length = strlen(A);
if (A[length - 1] == '\n') {
A[length - 1] = '\0';
}
Ugh. A simple (but broken) scanf("%s") has turned into a 7 line monstrosity. And that's the second lesson of the day: C is not good at I/O and string handling. It can be done, and it can be done safely, but C will kick and scream the whole time.
As already pointed out - you have to define/allocate an array of length N + 1 in order to store N chars correctly. It is possible to limit the amount of characters read by scanf. In your example it would be:
scanf("%4s", A);
in order to read max. 4 chars from stdin.
character arrays in c are merely pointers to blocks of memory. If you tell the compiler to reserve 5 bytes for characters, it does. If you try to put more then 5 bytes in there, it will just overwrite the memory past the 5 bytes you reserved.
That is why c can have serious security implementations. You have to know that you are only going to write 4 characters + a \0. C will let you overwrite memory until the program crashes.
Please don't think of char foo[5] as a string. Think of it as a spot to put 5 bytes. You can store 5 characters in there without a null, but you have to remember you need to do a memcpy(otherCharArray, foo, 5) and not use strcpy. You also have to know that the otherCharArray has enough space for those 5 bytes.
You'll end up with undefined behaviour.
As you say, the size of A will always be 5, so if you read 5 or more chars, scanf will try to write to a memory, that it's not supposed to modify.
And no, there's no reserved space/char for the \0 symbol.
Any string greater than 4 characters in length will cause scanf to write beyond the bounds of the array. The resulting behavior is undefined and, if you're lucky, will cause your program to crash.
If you're wondering why scanf doesn't stop writing strings that are too long to be stored in the array A, it's because there's no way for scanf to know sizeof(A) is 5. When you pass an array as the parameter to a C function, the array decays to a pointer pointing to the first element in the array. So, there's no way to query the size of the array within the function.
In order to limit the number of characters read into the array use
scanf("%4s", A);
There isn't a character that is reserved, so you must be careful not to fill the entire array to the point it can't be null terminated. Char functions rely on the null terminator, and you will get disastrous results from them if you find yourself in the situation you describe.
Much C code that you'll see will use the 'n' derivatives of functions such as strncpy. From that man page you can read:
The strcpy() and strncpy() functions return s1. The stpcpy() and
stpncpy() functions return a
pointer to the terminating `\0' character of s1. If stpncpy() does not terminate s1 with a NUL
character, it instead returns a pointer to s1[n] (which does not necessarily refer to a valid mem-
ory location.)
strlen also relies on the null character to determine the length of a character buffer. If and when you're missing that character, you will get incorrect results.
the null character is used for the termination of array. it is at the end of the array and shows that the array is end at that point. the array automatically make last character as null character so that the compiler can easily understand that the array is ended.
\0 is an terminator operator which terminates itself when array is full
if array is not full then \0 will be at the end of the array
when you enter a string it will read from the end of the array

Scanning a file and allocating correct space to hold the file

I am currently using fscanf to get space delimited words. I establish a char[] with a fixed size to hold each of the extracted words. How would I create a char[] with the correct number of spaces to hold the correct number of characters from a word?
Thanks.
Edit: If I do a strdup on a char[1000] and the char[1000] actually only holds 3 characters, will the strdup reserve space on the heap for 1000 or 4 (for the terminating char)?
Here is a solution involving only two allocations and no realloc:
Determine the size of the file by seeking to the end and using ftell.
Allocate a block of memory this size and read the whole file into it using fread.
Count the number of words in this block.
Allocate an array of char * able to hold pointers to this many words.
Loop through the block of text again, assigning to each pointer the address of the beginning of a word, and replacing the word delimiter at the end of the word with 0 (the null character).
Also, a slightly philosophical matter: If you think this approach of inserting string terminators in-place and breaking up one gigantic string to use it as many small strings is ugly, hackish, etc. then you probably should probably forget about programming in C and use Python or some other higher-level language. The ability to do radically-more-efficient data manipulation operations like this while minimizing the potential points of failure is pretty much the only reason anyone should be using C for this kind of computation. If you want to go and allocate each word separately, you're just making life a living hell for yourself by doing it in C; other languages will happily hide this inefficiency (and abundance of possible failure points) behind friendly string operators.
There's no one-and-only way. The idea is to just allocate a string large enough to hold the largest possible string. After you've read it, you can then allocate a buffer of exactly the right size and copy it if needed.
In addition, you can also specify a width in your fscanf format string to limit the number of characters read, to ensure your buffer will never overflow.
But if you allocated a buffer of, say 250 characters, it's hard to imaging a single word not fitting in that buffer.
char *ptr;
ptr = (char*) malloc(size_of_string + 1);
char first = ptr[0];
/* etc. */

Doubt in count value passed to strncat

Suppose I have an array of size 10 characters (memset to 0), which I am passing to strncat as destination, and in source I am passing a string which is say 20 characters in length (null terminated), now should I pass the 'count' as 10 or 9?
The doubt is, does strncpy considers the 'count' as size of destination buffer or does it just copy 10 characters to the destination and then append a NULL terminating character in the 11th position.
Sorry if the question appears too trivial, but I was unable to make this out from the help documentation of strncpy.
You should probably just read the man page a bit more. The operative sentence seems to be this one:
The strncat() function is similar,
except that it will use at most n
characters from src. Since the result
is always terminated with '\0', at
most n+1 characters are written.
strncat will apply the null terminal for you. Since it has no knowledge about the alleged string you are pointing to, it will assume there is space for the null terminal. So you want to pass in 9.
If your array only has room for 10 characters then your count should be 9, as strncat will try to append count characters from src plus a null terminator.
Also, in this case your destination should have a null terminator in the first position, because that is where you need it to start appending.
$ man strncat
If src contains n or more characters, strncat() writes n+1 characters
to dest (n from src plus the terminating null byte). Therefore, the
size of dest must be at least strlen(dest)+n+1
$
The simple answer is: Ensure your buffer is big enough, if your buffer is to hold 10 characters, add one on to the size of the buffer to accomodate the nul character \0. That cannot be stressed enough and is one of the biggest stumbling blocks of learning C.
If you did not specify the appropriate length excluding the nul character, the buffer overflows and unpredictable results will occur, such as program crash, or jump off into the woods never to be seen again.
Hope this helps,
Best regards,
Tom.

Resources