I am quite new to C programming so feel free to correct me, I insist. My basic understanding of strings in C is when we initialize a string a null character is automatically assigned at the end of the string and the null character cannot be read read or written, but is used internally only.
So when I create a string of size 4 as char str[3] and assign a word to it say "RED" and print it using puts function or printf("%s",str), I get an unusual output printed as RED(SMIILEY FACE)
I then again reduce the size of string to char str[2] and assign RED to it and then compile it and the again receive a output stating RE(Smiley face)
If someone can explain it to me I will be thankful . Posting the C code below
int main()
{
char s1[3]="RED";
char s2[]="RED";
puts(s1);
puts(s2);
printf("%s",s1);
return 0;
}
char s1[3] = "RED";
Is a valid statement. It copies 3 characters from the constant string literal "RED" (which is 4 characters long) into the character array s1. There is no terminating '\0' in s1, because there is no room for it.
Note the copy, because s1 is mutable, while "RED" is not. This makes the statement different from e.g. const char *s1 = "RED";, where the string is not copied.
The result of both puts(s1) and printf("%s", s1) are undefined. There is no terminating '\0' in s1. Treating it as a string with one can lead to arbitrary behavior.
char s2[] = "RED";
Here, sizeof(s2) == 4, because "RED" has four characters, you need to count the trailing '\0' when calculating space.
The null character takes one exra character(byte). So you need to use an extra space in addition to the number of characters in the word you are initializing.
char s1[4]="RED"; //3 for RED and 1 for the null character
On the other hand
char s2[3]="RED";
there is no space for null character. "RED" is in there but you would encounter I/O problems when printing it as there is no null character stored at the end. Your data is stored fine but it can't be recognized properly by the printf as there is no null character.
char s2[]="RED";
This would work as memory of 4 (bytes) is automatically assigned which includes space for the terminating null character.
Related
I wanted to test things out with arrays on C as I'm just starting to learn the language. Here is my code:
#include <stdio.h>
main(){
int i,t;
char orig[5];
for(i=0;i<=4;i++){
orig[i] = '.';
}
printf("%s\n", orig);
}
Here is my output:
.....�
It is exactly that. What are those mysterious characters? What have i done wrong?
%s with printf() expects a pointer to a string, that is, pointer to the initial element of a null terminated character array. Your array is not null terminated.
Thus, in search of the terminating null character, printf() goes out of bound, and subsequently, invokes undefined behavior.
You have to null-terminate your array, if you want that to be used as a string.
Quote: C11, chapter §7.21.6.1, (emphasis mine)
s
If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. If the
precision is specified, no more than that many bytes are written. If the
precision is not specified or is greater than the size of the array, the array shall
contain a null character.
Quick solution:
Increase the array size by 1, char orig[6];.
Add a null -terminator in the end. After the loop body, add orig[i] = '\0';
And then, print the result.
char orig[5];//creates an array of 5 char. (with indices ranging from 0 to 4)
|?|?|?|0|0|0|0|0|?|?|?|?|
| ^memory you do not own (your mysterious characters)
^start of orig
for(i=0;i<=4;i++){ //attempts to populate array with '.'
orig[i] = '.';
|?|?|?|.|.|.|.|.|?|?|?|?|
| ^memory you do not own (your mysterious characters)
^start of orig
This results in a non null terminated char array, which will invoke undefined behavior if used in a function that expects a C string. C strings must contain enough space to allow for null termination. Change your declaration to the following to accommodate.
char orig[6];
Then add the null termination to the end of your loop:
...
for(i=0;i<=4;i++){
orig[i] = '.';
}
orig[i] = 0;
Resulting in:
|?|?|?|.|.|.|.|.|0|?|?|?|
| ^memory you do not own
^start of orig
Note: Because the null termination results in a C string, the function using it knows how to interpret its contents (i.e. no undefined behavior), and your mysterious characters are held at bay.
There is a difference between an array and a character array. You can consider a character array is an special case of array in which each element is of type char in C and the array should be ended (terminated) by a character null (ASCII value 0).
%s format specifier with printf() expects a pointer to a character array which is terminated by a null character. Your array is not null terminated and hence, printf function goes beyond 5 characters assigned by you and prints garbage values present after your 5th character ('.').
To solve your issues, you need to statically allocate the character array of size one more than the characters you want to store. In your case, a character array of size 6 will work.
#include <stdio.h>
int main(){
int i,t;
char orig[6]; // If you want to store 5 characters, allocate an array of size 6 to store null character at last position.
for (i=0; i<=4; i++) {
orig[i] = '.';
}
orig[5] = '\0';
printf("%s\n", orig);
}
There is a reason to waste one extra character space for the null character. The reason being whenever you pass any array to a function, then only pointer to first element is passed to the function (pushed in function's stack). This makes for a function impossible to determine the end of the array (means operators like sizeof won't work inside the function and sizeof will return the size of the pointer in your machine). That is the reason, functions like memcpy, memset takes an additional function arguments which mentions the array sizes (or the length upto which you want to operate).
However, using character array, function can determine the size of the array by looking for a special character (null character).
You need to add a NUL character (\0) at the end of your string.
#include <stdio.h>
main()
{
int i,t;
char orig[6];
for(i=0;i<=4;i++){
orig[i] = '.';
}
orig[i] = '\0';
printf("%s\n", orig);
}
If you do not know what \0 is, I strongly recommand you to check the ascii table (https://www.asciitable.com/).
Good luck
prinftf takes starting pointer of any memory location, array in this case and print till it encounter a \0 character. These type of strings are called as null terminated strings.
So please add a \0 at the end and put in characters till (size of array - 2) like this :
main(){
int i,t;
char orig[5];
for(i=0;i<4;i++){ //less then size of array -1
orig[i] = '.';
}
orig[i] = '\0'
printf("%s\n", orig);
}
When I run the program, the second printf() prints string2 with whatever was scanned into string1 attached to the end.
e.g. 123 was scanned into string1 then it prints: Is before "12ab123".
as opposed to 12ab.
Why not just "12ab"?
char string1[MAX_STR_LEN];
char string2[4]={'1','2','a','b'};
char five='5';
char abc[3]={'a','b','c'};
printf("Enter a string:");
scanf("%s", string1);
printf("Is before \"%s\":",string2);
A string is a null terminated char array in C.
Change
char string2[4]={'1','2','a','b'};
to
char string2[5]={'1','2','a','b', '\0'};
(which is the same as char string2[] = "12ab";)
You need to terminate your array with NULL character as
char string2[5]={'1','2','a','b','\0'};
When you are doing the scanf(), string1 is stored in next memory so it is printing string2 with string1. It will print upto it gets \0 so its Undefined Behavior
In your code
char string2[4]={'1','2','a','b'};
string2 is not null-terminated. Using that array as an argument to %s format specifier invokes undefined behavior, as it runs past the allocated memory in search of the null-terminator.
You need to add the null-terminator yourself like
char string2[5]={'1','2','a','b','\0'};
to use string2 as a string.
Also, alternatively, you can write
char string2[ ]= "12ab";
to allow the compiler to decide the size, which considers the space for (and adds) the null-terminator.
Same goes for abc also.
That said, you're scanning into string1 and printing string2, which is certainly not wrong, but does not make much sense, either.
Expanding on the previous answers, the strings appear to be joined due to the order the variables are stored in the stack memory. This won't always work the same on every processor architecture or compiler (optimiser settings can change this behaviour too).
If the format specifier %s does not have the precision flag then the function outputs characters until it encounteres zero character '\0'
Character array string2 is defined such a way that it does not have the terminating zero
char string2[4]={'1','2','a','b'};
So the function outputs characters beyond the array until it meets zero character.
You could use the precision flag that to specify explicitly how many characters you are going to output. For example
printf("Is before \"%4.4s\":",string2);
Or you could define the array that includes the terminating zero. For example
char string2[5] = { '1', '2', 'a', 'b', '\0' };
Take into account that in this case the size of the array if it is specified shall be equal at least to 5 ( though the size can be greater than 5; in this case other characters that do not have initializers will be zero-initialized)
or simply
char string2[] = { "12ab" };
or without braces
char string2[] = "12ab";
I want to assign the first two values from the hash array to the salt array.
char hash[] = {"HAodcdZseTJTc"};
char salt[] = {hash[0], hash[1]};
printf("%s", salt);
However, when I attempt this, the first two values are assigned and then all thirteen values are also assigned to the salt array. So my output here is not:
HA
but instead:
HAHAodcdZseTJTC
salt is not null-terminated. Try:
char salt[] = {hash[0], hash[1], '\0'};
Since you are adding just two characters to the salt array and you are not adding the '\0' terminator.
Passing a non nul terminated array as a parameter to printf() with a "%s" specifier, causes undefined behavior, in your case it prints hash in my case
HA#
was printed.
Strings in c use a special convetion to know where they end, a non printable special character '\0' is appended at the end of a sequence of non-'\0' bytes, and that's how a c string is built.
For example, if you were to compute the length of a string you would do something like
size_t stringlength(const char *string)
{
size_t length;
for (length = 0 ; string[length] != '\0' ; ++length);
return length;
}
there are of course better ways of doing it, but I just want to illustrate what the significance of the terminating '\0' is.
Now that you know this, you should notice that
char string[] = {'A', 'B', 'C'};
is an array of char but it's not a string, for it to be a string, it needs a terminating '\0', so
char string[] = {'A', 'B', 'C', '\0'};
would actually be a string.
Notice that then, when you allocate space to store n characters, you need to allocate n + 1 bytes, to make room for the '\0'.
In the case of printf() it will try to consume all the bytes that the passed pointer points at, until one of them is '\0', there it would stop iterating through the bytes.
That also explains the Undefined Behavior thing, because clearly printf() would be reading out of bounds, and anything could happen, it depends on what is actually there at the memory address that does not belong the the passed data but is off bounds.
There are many functions in the standard library that expect strings, i.e. _sequences of non nul bytes, followed by a nul byte.
Consider this code:
char name[]="123";
char name1[]="1234";
And this result
The size of name (char[]):4
The size of name1 (char[]):5
Why the size of char[] is always plus one?
Note the difference between sizeof and strlen. The first is an operator that gives the size of the whole data item. The second is a function that returns the length of the string, which will be less than its sizeof (unless you've managed to get string overflow), depending how much of its allocated space is actually used.
In your example
char name[]="123";
sizeof(name) is 4, because of the terminating '\0', and strlen(name) is 3.
But in this example:
char str[20] = "abc";
sizeof(str) is 20, and strlen(str) is 3.
As Michael pointed out in the comments the strings are terminated by a zero. So in memory the first string will look like this
"123\0"
where \0 is a single char and has the ASCII value 0. Then the above string has size 4.
If you had not this terminating character, how would one know, where the string (or char[] for that matter) ends? Well, indeed one other way is to store the length somewhere. Some languages do that. C doesn't.
In C, strings are stored as arrays of chars. With a recognised terminating character ('\0' or just 0) you can pass a pointer to the string, with no need for any further meta-data. When processing a string, you read chars from the memory pointed at by the pointer until you hit the terminating value.
As your array initialisation is using a string literal:
char name[]="123";
is equivalent to:
char name[]={'1','2','3',0};
If you want your array to be of size 3 (without the terminating character as you are not storing a string, you will want to use:
char name[]={'1','2','3'};
or
char name[3]="123";
(thanks alk)
which will do as you were expecting.
Because there is a null character that is attached to the end of string in C.
Like here in your case
name[0] = '1'
name[1] = '2'
name[2] = '3'
name[3] = '\0'
name1[0] = '1'
name1[1] = '2'
name1[2] = '3'
name1[3] = '4'
name1[4] = '\0'
A String in C (and in, probably, every programming language - behind the scenes) is an array of characters which is terminated by \0 with the ASCII value of 0.
When assigning: char arr[] = "1234";, you assign a string literal, which is, by default, null-terminated (\0 is also called null) as you can see here.
To avoid a null (assuming you want just an array of chars and not a string), you can declare it the following way char arr[] = {'1', '2', '3', '4'}; and the program will behave as you wish (sizeof(arr) would be 4).
name = {'1','2','3','\0'};
name1 = {'1','2','3','4','\0'};
So
sizeof(name) = 4;
sizeof(name1) = 5;
sizeof returns the size of the object and in this case the object is an array and it is defined that your array is 4 bytes long in first case and 5 bytes in second case.
In C, string literals have a null terminating character added to them.
Your strings,
char name[]="123";
char name1[]="1234";
look more like:
char name[]="123\0";
char name1[]="1234\0";
Hence, the size is always plus one. Keep in mind when reading strings from files or from whatever source, the variable where you store your string, should always have extra space for the null terminating character.
For example if you are expected to read string, whose maximum size is 100, your buffer variable, should have size of 101.
Every string is terminated with the char nullbyte '\0' which add 1 to your length.
I'm having difficulty with strncpy. I'm trying to split a string of 8 characters in two (the first 6 characters in one substring and then the remaining 2 characters in another). To illustrate the particular difficulty I have simplified my code to the following:
include stdio.h
include stdlib.h
include string.h
define MAXSIZE 100
struct word {
char string[8];
char sub1[2];
char sub2[6];
};
typedef struct word Word;
int main(void)
{
Word* p;
p=(Word*)malloc(MAXSIZE*sizeof(Word));
if (p==NULL) {
fprintf(stderr,"not enough memory");
return 0;
}
printf("Enter an 8-character string: \n");
scanf("%s",p->string);
strncpy(p->sub2,p->string,6);
strncpy(p->sub1,p->string,2);
printf("string=%s\n",p->string);
printf("sub1=%s\n",p->sub1);
printf("sub2=%s\n",p->sub2);
free(p);
return 0;
}
The user is prompted for an input. Suppose they input "12345678". Then the output of the program is:
string=1234567812123456
sub1=12123456
sub2=123456
The output I am expecting would be as follows:
string=12345678
sub1=12
sub2=123456
I don't understand how strncpy seems to be appending numbers to string... Obviously I don't understand strncpy well enough, but can anyone explain to me what's going on?
C strings need to be terminated with a null character (0).
strncpy does not put a null terminator on the string for you. If you want a 2-character string, you need to allocate room for three characters, and set the final one to null.
Try this:
struct word {
char string[9];
char sub1[3];
char sub2[7];
};
// ...
strncpy(p->sub2,p->string,6);
p->sub2[6] = 0;
strncpy(p->sub1,p->string,2);
p->sub1[2] = 0;
// ...
Note that if the user inputs more characters than you've allocated room for, you'll end up with problems.
You may find this part of the strncpy documentation useful:
The strncpy() function is similar, except that at most n bytes of src
are copied. Warning: If there is no null byte among the first n bytes
of src, the string placed in dest will not be null terminated.
You are printing strings that are not null-terminated. To fix this declare sub1 and sub2 with an extra char for the terminator:
char sub1[3];
char sub2[7];
And then null terminate after copying:
strncpy(p->sub2,p->string,6);
p->sub2[6] = '\0';
strncpy(p->sub1,p->string,2);
p->sub1[2] = '\0';
The strncpy() function copies at most
n characters from s2 into s1. If
s2 is less than n characters long, the remainder of s1 is filled
with `\0' characters. Otherwise, s1 is not terminated.
So given your string is longer, the strings are not zero terminated. When you print them, prinf is printing the characters you copied, but then carrying on printing whatever is there until it hits a NUL
Althoug scanf does NUL terminate its string, you've not allocated enough space. String in your stuct needs to be 9 characters long - 8 for the characters (12345678) and one more for the NUL. Right now the NUL is going in the first character of str1 - which you then overwrite with the strncpy