first character of a string using strncpy bug - c

I wanted to get the first character of a string using strncpy, here's my code
int main() {
char s[] = "abcdefghi";
char c[] = "";
printf("%c\n",c);
printf("%s\n",s);
strncpy(c,s,1);
printf("%c\n",c);
printf("%s\n",s);
return 0;
}
The problem is that c is still an empty string, what's wrong with it?

The problem is that c is an array (which decays into a pointer), while the %c format of printf() requires a character. If you pass the pointer when the function expects a character, you get undefined behavior, and a possible output will be the first byte of the address of the array, interpreted as an ASCII character.
Also, please note that char c[] = ""; simply declares an array of length 1 which contains the null (\0) character. If you try to print that, like you do in your code, you will likely get weird output.
If c is meant to only store a single character, declare it as a simple char variable, and pass it's address to strncpy():
char s[] = "abcdefghi";
char c = ' ';
printf("%c\n", c);
printf("%s\n", s);
strncpy(&c, s, 1); //notice the & before c
printf("%c\n", c);
printf("%s\n", s);
However, you have to be careful. Passing any value greater than 1 to strncpy() asks for big trouble, since you would be writing to an undefined location.

s is an array of 10 chars, c is an array with one char. Do you understand why? And which char is stored in c?
%s is the format for strings, %c is the format for chars. But c is not a char. An array containing one char is an array, not a char. So when you try to print c you get rubbish.
Now the bad one: strncpy in your case doesn’t produce a C string because there isn’t enough space for one. If you try to print c with %c you get nonsense. If you try to print it with %s you get undefined behaviour. Do NOT use strncpy unless you really know what it is doing (you should generally avoid strncpy).

Related

Why de-referencing is not used in case of printing of a string?

How we use a string name in C to print a string without dereferencing it?
Like here:
char str[];
printf("%s",str);
But in case of arrays we use dereferencing by square brackets[] to print an array:
int a[10];
for(i=0;i<10;i++);
printf("%d",a[i]);
What's the difference between printing a string and an array? We need dereferencing an array by using square brackets [] but in case of string without using any dereferencing printf(); is just printing the value of the string and not the address.
Because that is what the interface to printf dictates.
In case of a "%s" it expects a pointer to a NUL terminated string.
In case of a "%d" it expects an int.
It all depends on what the function (printf or else) expects.
When you use printf with %s, like
printf("%s",str);
printf() wants "a string" which, in C, is the same as "an array of [NUL-terminated] char". So, you pass the only thing that identifies actually the whole array: its name (which decays to the address of the first element).
The same applies to every function that expects an array as argument.
If, to the function you call, you must pass something else, you use different approaches. When printf is used to print an integer, like
printf("%d",a[i]);
you must pass an integer; and an array is not an integer, so you must dereference (or "select") an integer from the array, using the correct notation/mechanism. But things are not different if you want to use a member of a struct, for example. You can not pass printf() a struct, so you use the dot to select a [correct] member of the struct. And the same applies to pointers to integer, you dereference them when you want the value and not the pointer itself. And so on for functions... like
printf("%d", (int) floor(PI));
In the above line, again, if printf wants an integer, you must give it an integer, so to use PI (3.14) you must convert it to an int.
For starters for example such a code snippet
char str[5] = "Hello";
printf("%s",str);
can result in undefined behavior.
The format string "%s" expects that the argument is a pointer to first character of a string.
In C the string is defined as a sequence of characters terminated by the zero character '\0'. That is a character array that contains a string has the sentinel value '\0'.
So in fact such a call of printf
printf("%s",str);
logically is similar to the following code
while ( *str != '\0' ) putchar( *str++ );
In the example above
char str[5] = "Hello";
the character array str does not contain a string because its five elements were initialized by characters of the string literal "Hello" excluding its terminating zero character (there is no enough space in the array to accommodate the sixth character '\0' of the string literal).
So such a call
printf("%s",str);
for the array declared above will try to output anything that follows the array in the memory.
To avoid the undefined behavior you should declare the array for example like
char str[6] = "Hello";
or like
char str[] = "Hello";
As for arrays of other fundamental types then in most cases they do not have a standard sentinel value. Opposite to strings it is impossible to define such a sentinel value for example for integer or float arrays.
So character arrays that contains strings are exclusions from the total kinds of arrays. Strings always end with the terminating zero character '\0' by the definition of the string in C. That is not valid for other kinds of arrays.
Hence to output any array of other kinds you in general need to use a loop knowing how many elements you need to output.

Explanation needed on pointer arithmetic with an array of pointers to string literals

I'm currently trying to learn C, with some prior experience in Python and
pointer arithmetic is really confusing me right now.
#include <stdio.h>
int main()
{
char *s[] = {"String1", "Literal2", "Pointers3"};
printf("%c", s[1]+1);
return 0;
}
Why does it print m instead of i ?
When I replace the format string with %s it does what I expect and prints out iteral2(Go to the 0th index of the 1st string literal then move 1 memory adress forward and print the rest).
How does this work, why does it print out a seemingly arbitrary character instead of the 1st(or 1th?) index when I use the %c format string.
The %c format specifier expects a character, not a pointer to a character. The expression s[1] evaluates to a pointer to a character, pointing to "Literal2", and the expression s[1]+1 also evaluates to a pointer to a character, pointing to "iteral2".
So, you are passing printf() a pointer to a character, and you are telling it to print a character. So, what is happening is that the pointer is being re-interpreted as a character, and the output is garbage.
If you insert a character into "String1", (making it, say, "String11",) then everything will be moved upwards in memory by one location, so the value of the pointer will be greater by 1, and so it might print n instead of m.
To obtain a character, when all you have is a pointer to a character, you need to dereference the pointer. So, that would be "%c", *(s[1]+1).
#include <stdio.h>
int main() {
const char *s[] = {"String1", "Literal2", "Pointers3"};
printf("%c", s[1][1]);
return 0; }
Also i think you should make s[] constant as it's deprecated.

Char pointers and the printf function

I was trying to learn pointers and I wrote the following code to print the value of the pointer:
#include <stdio.h>
int main(void) {
char *p = "abc";
printf("%c",*p);
return 0;
}
The output is:
a
however, if I change the above code to:
#include <stdio.h>
int main(void) {
char *p = "abc";
printf(p);
return 0;
}
I get the output:
abc
I don't understand the following 2 things:
why did printf not require a format specifier in the second case? Is printf(pointer_name) enough to print the value of the pointer?
as per my understanding (which is very little), *p points to a contiguous block of memory that contains abc. I expected both outputs to be the same, i.e.
abc
are the different outputs because of the different ways of printing?
Edit 1
Additionally, the following code produces a runtime error. Why so?
#include <stdio.h>
int main(void) {
char *p = "abc";
printf(*p);
return 0;
}
For your first question, the printf function (and family) takes a string as first argument (i.e. a const char *). That string could contain format codes that the printf function will replace with the corresponding argument. The rest of the text is printed as-is, verbatim. And that's what is happening when you pass p as the first argument.
Do note that using printf this way is highly unrecommended, especially if the string is contains input from a user. If the user adds formatting codes in the string, and you don't provide the correct arguments then you will have undefined behavior. It could even lead to security holes.
For your second question, the variable p points to some memory. The expression *p dereferences the pointer to give you a single character, namely the one that p is actually pointing to, which is p[0].
Think of p like this:
+---+ +-----+-----+-----+------+
| p | ---> | 'a' | 'b' | 'c' | '\0' |
+---+ +-----+-----+-----+------+
The variable p doesn't really point to a "string", it only points to some single location in memory, namely the first character in the string "abc". It's the functions using p that treat that memory as a sequence of characters.
Furthermore, constant string literals are actually stored as (read-only) arrays of the number of character in the string plus one for the string terminator.
Also, to help you understand why *p is the same as p[0] you need to know that for any pointer or array p and valid index i, the expressions p[i] is equal to *(p + i). To get the first character, you have index 0, which means you have p[0] which then should be equal to *(p + 0). Adding zero to anything is a no-op, so *(p + 0) is the same as *(p) which is the same as *p. Therefore p[0] is equal to *p.
Regarding your edit (where you do printf(*p)), since *p returns the value of the first "element" pointed to by p (i.e. p[0]) you are passing a single character as the pointer to the format string. This will lead the compiler to convert it to a pointer which is pointing to whatever address has the value of that single character (it doesn't convert the character to a pointer to the character). This address is not a very valid address (in the ASCII alphabet 'a' has the value 97 which is the address where the program will look for the string to print) and you will have undefined behavior.
p is the format string.
char *p = "abc";
printf(p);
is the same as
print("abc");
Doing this is very bad practice because you don't know what the variable
will contain, and if it contains format specifiers, calling printf may do very bad things.
The reason why the first case (with "%c") only printed the first character
is that %c means a byte and *p means the (first) value which p is pointing at.
%s would print the entire string.
char *p = "abc";
printf(p); /* If p is untrusted, bad things will happen, otherwise the string p is written. */
printf("%c", *p); /* print the first byte in the string p */
printf("%s", p); /* print the string p */
why did printf not require a format specifier in the second case? Is printf(pointer_name) enough to print the value of the pointer?
With your code you told printf to use your string as the format string. Meaning your code turned equivalent to printf("abc").
as per my understanding (which is very little), *p points to a contiguous block of memory that contains abc. I expected both outputs to be the same
If you use %c you get a character printed, if you use %s you get a whole string. But if you tell printf to use the string as the format string, then it will do that too.
char *p = "abc";
printf(*p);
This code crashes because the contents of p, the character 'a' is not a pointer to a format string, it is not even a pointer. That code should not even compile without warnings.
You are misunderstanding, indeed when you do
char *p = "Hello";
p points to the starting address where literal "Hello" is stored. This is how you declare pointers. However, when afterwards, you do
*p
it means dereference p and obtain object where p points. In our above example this would yield 'H'. This should clarify your second question.
In case of printf just try
printf("Hello");
which is also fine; this answers your first question because it is effectively the same what you did when passed just p to printf.
Finally to your edit, indeed
printf(*p);
above line is not correct since printf expects const char * and by using *p you are passing it a char - or in other words 'H' assuming our example. Read more what dereferencing means.
why did printf not require a format specifier in the second case? Is printf(pointer_name) enough to print the value of the pointer?
"abc" is your format specifier. That's why it's printing "abc". If the string had contained %, then things would have behaved strangely, but they didn't.
printf("abc"); // Perfectly fine!
why did printf not require a format specifier in the second case? Is printf(pointer_name) enough to print the value of the pointer?
%c is the character conversion specifier. It instructs printf to only print the first byte. If you want it to print the string, use...
printf ("%s", p);
The %s seems redundant, but it can be useful for printing control characters or if you use width specifiers.
The best way to understand this really is to try and print the string abc%def using printf.
The %c format specifier expects a char type, and will output a single char value.
The first parameter to printf must be a const char* (a char* can convert implicitly to a const char*) and points to the start of a string of characters. It stops printing when it encounters a \0 in that string. If there is not a \0 present in that string then the behaviour of that function is undefined. Because "abc" doesn't contain any format specifiers, you don't pass any additional arguments to printf in that case.

pointer of char and int

Hi I have a simple question
char *a="abc";
printf("%s\n",a);
int *b;
b=1;
printf("%d\n",b);
Why the first one works but the second one doesnot work?
I think the first one should be
char *a="abc";
printf("%s\n",*a);
I think a stores the address of "abc". So why it shows abc when i print a? I think I should print *a to get the value of it.
Thanks
Ying
Why the first one works but the second one doesnot work?
Because in the first, you're not asking it to print a character, you're asking it to print a null-terminated array of characters as a string.
The confusion here is that you're thinking of strings as a "native type" in the same way as integers and characters. C doesn't work that way; a string is just a pointer to a bunch of characters ending with a null byte.
If you really want to think of strings as a native type (keeping in mind that they really aren't), think of it this way: the type of a string is char *, not char. So, printf("%s\n", a); works because you're passing a char * to match with a format specifier indicating char *. To get the equivalent problems as with the second example, you'd need to pass a pointer to a string—that is, a char **.
Alternatively, the equivalent of %d is not %s, but %c, which prints a single character. To use it, you do have to pass it a character. printf("%c\n", a) will have the same problem as printf("%d\n", b).
From your comment:
I think a stores the address of "abc". So why it shows abc when i print a? I think I should print *a to get the value of it.
This is where the loose thinking of strings as native objects falls down.
When you write this:
char *a = "abc";
What happens is that the compiler stores a array of four characters—'a', 'b', 'c', and '\0'—somewhere, and a points at the first one, the a. As long as you remember that "abc" is really an array of four separate characters, it makes sense to think of a as a pointer to that thing (at least if you understand how arrays and pointer arithmetic work in C). But if you forget that, if you think a is pointing at a single address that holds a single object "abc", it will confuse you.
Quoting from the GNU printf man page (because the C standard isn't linkable):
d, i
The int argument is converted to signed decimal notation …
c
… the int argument is converted to an unsigned char, and the resulting character is written…
s
… The const char * argument is expected to be a pointer to an array of character type (pointer to a string). Characters from the array are written up to (but not including) a terminating null byte ('\0') …
One last thing:
You may be wondering how printf("%s", a) or strchr(a, 'b') or any other function can print or search the string when there is no such value as "the string".
They're using a convention: they take a pointer to a character, and print or search every character from there up to the first null. For example, you could write a print_string function like this:
void print_string(char *string) {
while (*string) {
printf("%c", *string);
++string;
}
}
Or:
void print_string(char *string) {
for (int i=0; string[i]; ++i) {
printf("%c", string[i]);
}
}
Either way, you're assuming the char * is a pointer to the start of an array of characters, instead of just to a single character, and printing each character in the array until you hit a null character. That's the "null-terminated string" convention that's baked into functions like printf, strstr, etc. throughout the standard library.
Strings aren't really "first-class citizens" in C. In reality, they're just implemented as null-terminated arrays of characters, with some syntactic sugar thrown in to make programmers' lives easier. That means when passing strings around, you'll mostly be doing it via char * variables - that is, pointers to null-terminated arrays of char.
This practice holds for calling printf, too. The %s format is matched with a char * parameter to print the string - just as you've seen. You could use *a, but you'd want to match that with a %c or an integer format to print just the single character pointed to.
Your second example is wrong for a couple of reasons. First, it's not legal to make the assignment b = 1 without an explicit cast in C - you'd need b = (int *)1. Second, you're trying to print out a pointer, but you're using %d as a format string. That's wrong too - you should use %p like this: printf("%p\n", (void *)b);.
What it really looks like you're trying to do in the second example is:
int b = 1;
int *p = &b;
printf("%d\n", *p);
That is, make a pointer to an integer, then dereference it and print it out.
Editorial note: You should get a good beginner C book (search around here and I'm sure you'll find suggestions) and work through it.

Passing not null terminated string to printf results in unexpected value

This C program gives a weird result:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
char str1[5] = "abcde";
char str2[5] = " haha";
printf("%s\n", str1);
return 0;
}
when I run this code I get:
abcde haha
I only want to print the first string as can be seen from the code.
Why does it print both of them?
"abcde" is actually 6 bytes long because of the null terminating character in C strings. When you do this:
char str1[5] = "abcde";
You aren't storing the null terminating character so it is not a proper string.
When you do this:
char str1[5] = "abcde";
char str2[5] = " haha";
printf("%s\n", str1);
It just happens to be that the second string is stored right after the first, although this is not required. By calling printf on a string that isn't null terminated you have already caused undefined behavior.
Update:
As stated in the comments by clcto this can be avoided by not explicitly specifying the size of the array and letting the compiler determine it based off of the string:
char str1[] = "abcde";
or use a pointer instead if that works for your use case, although they are not the same:
const char *str1 = "abcde";
Both strings str1 and str2 are not null terminated. Therefore the statement
printf("%s\n", str1);
will invoke undefined behavior.
printf prints the characters in a string one by one until it encounters a '\0' which is not present in your string. In this case printf continues past the end of the string until it finds a null character somewhere in the memory. In your case it seems that printf past the end of string "abcde" and continues to print the characters from second string " haha" which is by chance located just after first string in the memory.
Better to change the block
char str1[5] = "abcde";
char str2[5] = " haha";
to
char str1[] = "abcde";
char str2[] = " haha";
to avoid this problem.
Technically, this behavior is not unexpected, it is undefined: your code is passing a pointer to a C string that lacks null terminator to printf, which is undefined behavior.
In your case, though, it happens that the compiler places two strings back-to-back in memory, so printf runs into null terminator after printing str2, which explains the result that you get.
If you would like to print only the first string, add space for null terminator, like this:
char str1[6] = "abcde";
Better yet, let the compiler compute the correct size for you:
char str1[] = "abcde";
You have invoked undefined behaviour. Here:
char str1[5] = "abcde";
str1 has space for the above five letters but no null terminator.
Then the way you try to pass not null terminated string to printf for printing, invokes undefined behaviour.
In general in most of the cases it is not good idea to pass not null terminated strings to standard functions which expect (C) strings.
In C such declarations
char str1[5] = "abcde";
are allowed. In fact there are 6 initializers because the string literal includes the terminating zero. However in the left side there is declared a character array that has only 5 elements. So it does not include the terminating zero.
It looks like
char str1[5] = { 'a', 'b', 'c', 'd', 'e', '\0' };
If you would compile this declaration in C++ then the compiler issues an error.
It would be better to declare the array without specifying explicitly its size.
char str1[] = "abcde";
In this case it would have the size equal to the number of characters in the string literal including the terminating zero that is equal to 6. And you can write
printf("%s\n", str1);
Otherwise the function continues to print characters beyond the array until it meets the zero character.
Nevertheless it is not an error. Simply you should correctly specify the format specifier in the call of printf:
printf("%5.5s\n", str1);
and you will get the expected result.
The %s specifier searches for a null termination. Therefore you need to add '\0' to the end of your string
char str1[6] = "abcde\0";

Resources