Right methods of copying strings with malloc - c

I want to copy the string “Best School” into a new space in memory, which of these statements can I use to reserve enough space for it
A. malloc(sizeof(“Best School”))
B. malloc(strlen(“Best School”))
C. malloc(11)
D. malloc(12)
E. malloc(sizeof(“Best School”) + 1)
F. malloc(strlen(“Best School”) + 1)
I am still very new to C programming language so I really am not too sure of which works well. But I will love for someone to show me which ones can be used and why they should be used.
Thank you.

Literal strings in C are really arrays, including the null-terminator.
When you use sizeof on a literal string, you get the size of the array, which of course includes the null-terminator inside the array.
So one correct way for a literal string would be sizeof("Best School") (or sizeof "Best School").
You can also use strlen. If you don't have a string literal but another array or a pointer to the first character of the string, then you must use strlen. But now you have to remember that strlen returns the length of the string without the null-terminator. So you need to add one for that.
So another correct way would then be strlen("Best School") + 1.
Using magic numbers is almost never correct.

Use of sizeof id limited to only this one case (string literal). Ti will not work if you will have a pointer referencing the string. Before you start to be more proficient in C language and "feel" the difference between arrays and pointers IMO you should always use strlen
Example:
char *duplicateString(const char *str)
{
char *newstring = malloc(strlen(str) + 1);
if(newstring) strcpy(newstring, str);
return newstring;
}
In this case, sizeof(str) would give the size of the pointer to char (usually 2, 4 or 8) not the the length of the string referenced by the str

Related

Pointer arithmetic in C when used as a target array for strcat()

When studying string manipulation in C, I've come across an effect that's not quite what I would have expected with strcat(). Take the following little program:
#include <stdio.h>
#include <string.h>
int main()
{
char string[20] = "abcde";
strcat(string + 1, "fghij");
printf("%s", string);
return 0;
}
I would expect this program to print out bcdefghij. My thinking was that in C, strings are arrays of characters, and the name of an array is a pointer to its first element, i.e., the element with index zero. So the variable string is a pointer to a. But if I calculate string + 1 and use that as the destination array for concatenation with strcat(), I get a pointer to a memory address that's one array element (1 * sizeof(char), in this case) away, and hence a pointer to the b. So my thinking was that the target destination is the array starting with b (and ending with the invisible null character), and to that the fghij is concatenated, giving me bcdefghij.
But that's not what I get - the output of the program is abcdefghij. It's the exact same output as I would get with strcat(string, "fghij"); - the addition of the 1 to string is ignored. I also get the same output with an addition of another number, e.g. strcat(string + 4, "fghij");, for that matter.
Can somebody explain to me why this is the case? My best guess is that it has to do with the binding precedence of the + operator, but I'm not sure about this.
Edit: I increased the size of the original array with char string[20] so that it will, in any case, be big enough to hold the concatenated string. Output is still the same, which I think means the array overflow is not key to my question.
You will get an output of abcdefghij, because your call to strcat hasn't changed the address of string (and nor can you change that – it's fixed for the duration of the block in which it is declared, just like the address of any other variable). What you are passing to strcat is the address of the second element of the string array: but that is still interpreted as the address of a nul-terminated string, to which the call appends the second (source) argument. Appending that second argument's content to string, string + 1 or string + n will produce the same result in the string array, so long as there is a nul-terminator at or after the n index.
To print the value of the string that you actually pass to the strcat call (i.e., starting from the 'b' character), you can save the return value of the call and print that:
#include <stdio.h>
#include <string.h>
int main()
{
char string[20] = "abcde";
char* result = strcat(string + 1, "fghij"); // strcat will return the "string + 1" pointer
printf("%s", result); // bcdefghij
return 0;
}
char string[] = "abcde";
strcat(string + 1, "fghij");
Append five characters to a full string array. Booom. Undefined behavior.
Adding something to a string array is a performance optimization that tells the runtime that the string is known to be at least that many characters long.
You seem to believe that a string is a thing of its own and not an array, and strcat is doing something to its first argument. That's not how that works. Strings are arrays*; and strcat is modifying the array contents.
*Somebody's going to come by and claim that heap allocated strings are not arrays. OP is not dealing with heap yet.
Arrays are non-modibfiable lvalues. For example you may not write
char string[20] = "abcde";
char string2[] = ""fghij"";
string = string2;
Used in expressions arrays with rare exceptions are implicitly converted to pointers to their first elements.
If you will write for example string + 1 then the address of the array will not be changed.
In this call
strcat(string + 1, "fghij");
elements of the array string are being overwritten starting from the second element of the array.
In this statement
printf("%s", string);
there is outputted the whole array starting from its first character (again the array designator used as an argument is converted to a pointer to its first element).
You could write for example
printf("%s", string + 1);
In this case the array is outputted starting from its second element.
These are just two pointers to different parts of the same memory inside the same array. There is nothing in your code which creates a second array. "the name of an array is a pointer to its first element" well, not really, it decays into a pointer to its first element whenever used in an expression. So in case of string + 1, this decay first happens to the string operand and then you get pointer arithmetic afterwards. You can actually never do pointer arithmetic on array types, only on decayed pointers. Details here: Do pointers support "array style indexing"?
As for strcat, it basically does two things: call strlen on the original string to find where it ends, then call strcpy to append the new string at the position where the null terminator was stored. It's the very same thing as typing strcpy(&src[strlen(src)], dst);
Therefore it won't matter if you pass string + 1 or string, because in either case strcat will look for the null terminator and nothing else.

C - Convert int to char

I am looking for a solution for my problem (newbie here).
I have an array of strings (char** arrayNum) and I would like to store the number of elements of that array at the first index.
But I can't find the right way to do convert the number of elements as a string (or character).
I have tried itoa, casting (char), +'0', snprintf ... nothing works.
As every time I ask a question on Stack Overflow, I am sure the solution will be obvious. Thanks in advance for your help.
So I have an array of strings char** arrayNum which I populate, leaving the index 0 empty.
When I try to assign a string to arrayNum[0]:
This works: arrayNum[0] = "blabla";
This does not work: arrayNum[0] = (char) ((arraySize - 1)+'0');
I have tried countless others combinations, I don't even remember...
arrayNum can be thought of as an array of strings (char *). So you will naturally have trouble trying to assign a char (or indeed any type other than char *) to an element of this array.
I think it would preferable to store the length of the array separately to the array. For example, using a struct. To do otherwise invites confusion.
If you really, really want to store the length in the first element, then you could do something like:
arrayNum[0] = malloc(sizeof(char));
arrayNum[0][0] = (char) ((arraySize - 1)+'0');
This takes advantage of the fact that arrayNum is strictly an array of pointers and each of those pointers is a pointer to a char. So it can point to a single character or an array of characters (a "string").
Compare this for clarity with (say):
struct elements {
int length;
char **data;
};
arrayNum is not an "array of strings."
It might be useful for you to think about it that way, but it is important for you to know what it really is. It is an array of pointers where each pointer is a pointer to char.
Sometimes a pointer to char is a "string," and sometimes it's a pointer into the middle of a string, and sometimes it's just a pointer to some character somewhere. It all depends on how you use it.
The C programming language does not really have strings. It has string literals, but a string literal is just a const array of characters that happens to end with a \000. The reason you can write arrayNum[0] = "blabla"; is because the value of the string literal "blabla" is a pointer to the first 'b' in "blabla", and the elements of the arrayNum array are pointers to characters.
It's your responsibility to decide whether arrayNum[i] points to the first character of some string, or whether it just happens to point to some single character; and it's your responsibility to decide and keep track of whether it points to something that needs to be freed() or whether it points to read-only memory, or whether it points to something on the stack, or whether it points to/into some staticly allocated data structure.
The language doesn't care.

How do I Initialize C code while only using words

how do i Initialize my code if all im using are words and no numbers?
I have been trying to just use char * but it is saying that its still not initialized
char *Carson;
printf("Enter a name:\n");
scanf("%s",Name);
printf("%s Hello Carson\n", Carson);
You either have to allocate memory dynamically and assign it to Carson (see e.g. `malloc? ), or make it an array. There's no way around it. And for that, the code must contain a number. The number could be input from the user though, so you won't have any actual numbers in the source.
Remember that in C all strings need an extra terminator character (added automatically by scanf) so remember to add space for it.
A solution without any number, I don't think this must be used for practical applications, just a hack
char Carson[sizeof(long long) * sizeof(long long)];
printf("Size = %d\n", sizeof Carson);
printf("Enter a name:\n");
scanf("%s",Carson);
printf("%s Hello Carson\n", Carson);
In my system it create a char array of 64 bytes = 8 * 8, the size of long long in most systems is 8 bytes although it's size depends on your compiler and operating system
you might like to initialize Carson like this:
char *Carson = malloc(sizeof(char)*200);/* for 200 characters */
Don't forget to add \0 terminator and also, donot forget to free it once you are done using it.
In order to initialize variables in C you need to use constants values, that is, expressions whose value can be known at compile time.
For integer or float types you can use mathematical formulas involving only constant operands, thus you can obtain still a constant value that can be used in a initiaiization.
What you call "words" have been called better "strings".
In C you are able to use strings that are constant at compile time, also called "string literals".
A string literal has to be indicated surrounded by quotes, like these examples:
"Hello world!"
"Peter & John"
"user#gmail.com"
and so on.
There are some rules that you need to remember: some special characters have "escape sequences" to be used inside a string literal.
Now you can use that string literals in order to initialize a char* variable:
char *name = "Mr. Smith";
char *city = "Amsterdam";
The result of the initialization gives a C string style, that is, an array of char object, whose length is the amount of quoted characters in the string literal, plus 1, because a null character is added at the end. Thus, in memory you have:
char *city ----> |A|m|s|t|e|r|d|a|m|\0|
Thus, city points to an array of 10 chars.
The last character, \0, means "null character", whose ASCII code is 0. Since it corresponds to a non-printable character, it has to be indicated with the escape sequence \0.
For more information, take a look on these websites:
Escape sequences in C
Storage of string literals
If you initialize a pointer to char object to a string literal, the compiler reserves memory automatically for you, son you don't need any malloc() at all.
However, you cannot modify the characters of such a string.
If you are interested in modify the characters, you can use better un array of char object:
char name[30] = "Schwarzenegger";
The array reserves 30 chars for the string literal "Schwarzenegger".
Only the first 14 are used for the string, plus 1 holding the null character attached at the end.
The rest of chars of the array have dummy information, but there is no problem because they are not printed. (The standard library functions always stop processing the string when they encounter a null character).
EDITED More information.
About your particular error message: "lack of initialization", the problem is that in the definition of the pointer to char object:
char *name;
you only have a "pointer to" an undefined block of memory.
You have to specify the array of char that name will be point to.
If you initialize with a string literal, there is not any problem, because the address of the string literal is passed to name.
But, since you are planning to use name for data input by means of scanf(), you have to allocate memory enough. You can do that other users have explained yet in their answers, that is, by using malloc().
I think there is need to do changes in your code,
char Carson = NULL;
Carson = (char)malloc(sizeof(char)*256);
printf("Enter a name:\n");
scanf("%s",Carson );
printf("%s Hello Carson\n", Carson);
in place of 256 u can use whatever value you want.
let me know if it works.

sizeof for a null terminated const char*

const char* a;
how do I make sure that string 'a' is null terminated? when a = "abcd" and I do sizeof(a), I get 4. Does that mean its not null-terminated? if it were, I would have gotten 5 ?
sizeof(a) gives you the size of the pointer, not of the array of characters the pointer points to. It's the same as if you had said sizeof(char*).
You need to use strlen() to compute the length of a null-terminated string (note that the length returned does not include the null terminator, so strlen("abcd") is 4, not 5). Or, you can initialize an array with the string literal:
char a[] = "abcd";
size_t sizeof_a = sizeof(a); // sizeof_a is 5, because 'a' is an array not a pointer
The string literal "abcd" is null terminated; all string literals are null terminated.
You get 4 because that's the size of a pointer on your system. If you want to get the length of a nul terminated string, you want the strlen function in the C standard library.
The problem here is that you are confusing sizeof() which is a compile time operation with the length of a string which is a runtime operation. The reason get 4 back when you run sizeof(a) is that a is a pointer and the typical size of a pointer in C is 4 bytes. In order to get the length of the string use strlen.
For the second question, how to make sure a string is null terminated. The only way to definitively do this is to null terminate the string yourself. Given only a char* there is no way to 100% guarantee it is properly null terminated. Great care must be taken to ensure the the contract between the producer and consumer of the char* is understood as to who terminates the string.
If you are handed a char array which may or may not have null-terminated data in it, there really isn't a good way to check. The best you can do is search for a null character up to a certian specified length (not indefinitely!). But 0 isn't exactly an unusual byte of data to find in an uninitialzed area of memory.
This is one of the many things about C's defacto string standard that many people dislike. Finding the length of a string a client hands you is an O(n) search operation at best, and a segmentation fault at worst.
Another issue of course is that arrays and pointers are interchangable. That means array_name + 2 is the same as &(array_name[2]), and sizeof(a) is sizeof(char*), not the length of the array.
sizeof(a) is sizeof(const char*), the size of the pointer. It is not affected by the contents of a. For that, you want strlen.
Also, all double-quoted string literals like your "abcd" in source code are automatically null terminated.
sizeof(a) returns the size of the const char *a...not the size of what it is pointing to. You can use strlen(a) to gind the length of the null-terminated string and no, the result of strlen does not include the null-terminator.

I'm new to C, can someone explain why the size of this string can change?

I have never really done much C but am starting to play around with it. I am writing little snippets like the one below to try to understand the usage and behaviour of key constructs/functions in C. The one below I wrote trying to understand the difference between char* string and char string[] and how then lengths of strings work. Furthermore I wanted to see if sprintf could be used to concatenate two strings and set it into a third string.
What I discovered was that the third string I was using to store the concatenation of the other two had to be set with char string[] syntax or the binary would die with SIGSEGV (Address boundary error). Setting it using the array syntax required a size so I initially started by setting it to the combined size of the other two strings. This seemed to let me perform the concatenation well enough.
Out of curiosity, though, I tried expanding the "concatenated" string to be longer than the size I had allocated. Much to my surprise, it still worked and the string size increased and could be printf'd fine.
My question is: Why does this happen, is it invalid or have risks/drawbacks? Furthermore, why is char str3[length3] valid but char str3[7] causes "SIGABRT (Abort)" when sprintf line tries to execute?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main() {
char* str1 = "Sup";
char* str2 = "Dood";
int length1 = strlen(str1);
int length2 = strlen(str2);
int length3 = length1 + length2;
char str3[length3];
//char str3[7];
printf("%s (length %d)\n", str1, length1); // Sup (length 3)
printf("%s (length %d)\n", str2, length2); // Dood (length 4)
printf("total length: %d\n", length3); // total length: 7
printf("str3 length: %d\n", (int)strlen(str3)); // str3 length: 6
sprintf(str3, "%s<-------------------->%s", str1, str2);
printf("%s\n", str3); // Sup<-------------------->Dood
printf("str3 length after sprintf: %d\n", // str3 length after sprintf: 29
(int)strlen(str3));
}
This line is wrong:
char str3[length3];
You're not taking the terminating zero into account. It should be:
char str3[length3+1];
You're also trying to get the length of str3, while it hasn't been set yet.
In addition, this line:
sprintf(str3, "%s<-------------------->%s", str1, str2);
will overflow the buffer you allocated for str3. Make sure you allocate enough space to hold the complete string, including the terminating zero.
void main() {
char* str1 = "Sup"; // a pointer to the statically allocated sequence of characters {'S', 'u', 'p', '\0' }
char* str2 = "Dood"; // a pointer to the statically allocated sequence of characters {'D', 'o', 'o', 'd', '\0' }
int length1 = strlen(str1); // the length of str1 without the terminating \0 == 3
int length2 = strlen(str2); // the length of str2 without the terminating \0 == 4
int length3 = length1 + length2;
char str3[length3]; // declare an array of7 characters, uninitialized
So far so good. Now:
printf("str3 length: %d\n", (int)strlen(str3)); // What is the length of str3? str3 is uninitialized!
C is a primitive language. It doesn't have strings. What it does have is arrays and pointers. A string is a convention, not a datatype. By convention, people agree that "an array of chars is a string, and the string ends at the first null character". All the C string functions follow this convention, but it is a convention. It is simply assumed that you follow it, or the string functions will break.
So str3 is not a 7-character string. It is an array of 7 characters. If you pass it to a function which expects a string, then that function will look for a '\0' to find the end of the string. str3 was never initialized, so it contains random garbage. In your case, apparently, there was a '\0' after the 6th character so strlen returns 6, but that's not guaranteed. If it hadn't been there, then it would have read past the end of the array.
sprintf(str3, "%s<-------------------->%s", str1, str2);
And here it goes wrong again. You are trying to copy the string "Sup<-------------------->Dood\0" into an array of 7 characters. That won't fit. Of course the C function doesn't know this, it just copies past the end of the array. Undefined behavior, and will probably crash.
printf("%s\n", str3); // Sup<-------------------->Dood
And here you try to print the string stored at str3. printf is a string function. It doesn't care (or know) about the size of your array. It is given a string, and, like all other string functions, determines the length of the string by looking for a '\0'.
Instead of trying to learn C by trial and error, I suggest that you go to your local bookshop and buy an "introduction to C programming" book. You'll end up knowing the language a lot better that way.
There is nothing more dangerous than a programmer who half understands C!
What you have to understand is that C doesn't actually have strings, it has character arrays. Moreover, the character arrays don't have associated length information -- instead, string length is determined by iterating over the characters until a null byte is encountered. This implies, that every char array should be at least strlen + 1 characters in length.
C doesn't perform array bounds checking. This means that the functions you call blindly trust you to have allocated enough space for your strings. When that isn't the case, you may end up writing beyond the bounds of the memory you allocated for your string. For a stack allocated char array, you'll overwrite the values of local variables. For heap-allocated char arrays, you may write beyond the memory area of your application. In either case, the best case is you'll error out immediately, and the worst case is that things appear to be working, but actually aren't.
As for the assignment, you can't write something like this:
char *str;
sprintf(str, ...);
and expect it to work -- str is an uninitialized pointer, so the value is "not defined", which in practice means "garbage". Pointers are memory addresses, so an attempt to write to an uninitialized pointer is an attempt to write to a random memory location. Not a good idea. Instead, what you want to do is something like:
char *str = malloc(sizeof(char) * (string length + 1));
which allocates n+1 characters worth of storage and stores the pointer to that storage in str. Of course, to be safe, you should check whether or not malloc returns null. And when you're done, you need to call free(str).
The reason your code works with the array syntax is because the array, being a local variable, is automatically allocated, so there's actually a free slice of memory there. That's (usually) not the case with an uninitialized pointer.
As for the question of how the size of a string can change, once you understand the bit about null bytes, it becomes obvious: all you need to do to change the size of a string is futz with the null byte. For example:
char str[] = "Foo bar";
str[1] = (char)0; // I'd use the character literal, but this editor won't let me
At this point, the length of the string as reported by strlen will be exactly 1. Or:
char str[] = "Foo bar";
str[7] = '!';
after which strlen will probably crash, because it will keep trying to read more bytes from beyond the array boundary. It might encounter a null byte and then stop (and of course, return the wrong string length), or it might crash.
I've written all of one C program, so expect this answer to be inaccurate and incomplete in a number of ways, which will undoubtedly be pointed out in the comments. ;-)
Your str3 is too short - you need to add extra byte for null-terminator and the length of "<-------------------->" string literal.
Out of curiosity, though, I tried
expanding the "concatenated" string to
be longer than the size I had
allocated. Much to my surprise, it
still worked and the string size
increased and could be printf'd fine.
The behaviour is undefined so it may or may not segfault.
strlen returns the length of the string without the trailing NULL byte (\0, 0x00) but when you create a variable to hold the combined strings you need to add that 1 character.
char str3[length3 + 1];
…and you should be all set.
C strings are '\0' terminated and require an extra byte for that, so at least you should do
char str3[length3 + 1]
will do the job.
In sprintf() ypu are writing beyond the space allocated for str3. This may cause any type of undefined behavior (If you are lucky then it will crash). In strlen(), it is just searching for a NULL character from the memory location you specified and it is finding one in 29th location. It can as well be 129 also i.e. it will behave very erratically.
A few important points:
Just because it works doesn't mean it's safe. Going past the end of a buffer is always unsafe, and even if it works on your computer, it may fail under a different OS, different compiler, or even a second run.
I suggest you think of a char array as a container and a string as an object that is stored inside the container. In this case, the container must be 1 character longer than the object it holds, since a "null character" is required to indicate the end of the object. The container is a fixed size, and the object can change size (by moving the null character).
The first null character in the array indicates the end of the string. The remainder of the array is unused.
You can store different things in a char array (such as a sequence of numbers). It just depends on how you use it. But string function such as printf() or strcat() assume that there is a null-terminated string to be found there.

Resources