C pointer behaviour, Volume II - c

Look at this:
int main() {
char *verse = "zappa";
printf("%c\n", *verse);
// the program correctly prints the first character
*verse++;
printf("%c\n", *verse);
// the program correctly prints the second character, which in fact lies
// in the adjacent memory cell
(*verse)++;
printf("%c\n", *verse);
// the program doesn't print anything and crashes. Why?
return 0;
}
Why does my program crash as I try to increment the value pointed by verse? I was expecting something like the next character in the ASCII table.

This line (*verse)++; modifies a string literal. That is undefined behavior.
Note that the earlier line of code *verse++ is parsed as *(verse++)

verse points to a string literal, which you're not allowed to modify. Try:
char verse1[] = "zappa";
char *verse = verse1;
Now your code will work because verse1 is a modifiable string.
Note that *verse++ is effectively equivalent to just verse++. The indirection returns the value pointed to by the pointer before the increment, but since you're not doing anything with the return value of the expression, the indirection doesn't really do anything.

Related

How to write my_strchr() in C

Right now I hope to write my own my_strchr() in the C language.
I checked that the answer should be like this:
char *my_strchr(const char *src1, int c) {
while (*src1 != '\0') { //when the string goes to the last, jump out
if (c == *src1) {
return src1;
}
src1++;
}
return NULL;
}
I'm quite confused by:
why we use *src1 in the while loop condition (*src1 != '\0'), is *src1 a pointer to the const char*? Can we just use src1 instead?
When we return value and src1++, we do not have that *src1, why?
For this function, it in fact prints the whole string after the required characters, but why is that? Why does it not print only the character that we need to find?
src1 is the pointer to the character, but we need the character itself. It's the same reason as in point 2, but the other way round.
If you write return *src1; you simply return the character you've found, that's always c, so your function would be pointless. You want to return the pointer to that char.
Because that's what the function is supposed to do. It returns the pointer to the first character c found. So printing the string pointed by that pointer prints the rest of the string.
It's important here to remember that in C a string is a series of characters that ends with a null ('\0') character. We reference the string in our code using a character pointer that points to the beginning of the string. When we pass a string as a parameter to a function what we're really getting is a pointer to the first character in the string.
Because of this fact, we can use pointer math to increment through a string. The pattern:
while (*src1 != '\0') {
//do stuff
src1++;
}
is a very common idiom in C. We might phrase it in English as:
While the value of the character in the string we are looking at (dereference src1 with the * operator) is not (inequality operator !=) the end of string indicator (null byte, 0 or '\0'), do some stuff, then move the pointer to point to the next character in the string (increment operator ++).
We often use the same kind of code structure to process other arrays or linked lists of things, comparing pointers to the NULL pointer.
To question #2, we're returning the value of the pointer from this function src1 and not the value of what it points to *scr1 because the question that this function should answer is "Where is the first instance of character c in the string that starts at the location pointed to by src1.
Question #3 implies that the code that calls this function is printing a string that starts from the pointer returned from this function. My guess is that the code looks something like this:
printf("%s", my_strchr(string, 'a'));
printf() and friends will print from the location provided in the argument list that matches up with the %s format specifier and then keep printing until it gets to the end of string character ('\0', the null terminator).
In C, a string is basically an array of char, an array is a pointer pointing to the first element in the array, and a pointer is the memory address of the value. Therefore:
You can use *src1 to get the value that it is pointing to.
src1++ means to +1 on the address, so you are basically moving where the pointer is pointing at.
Since you are returning a pointer, it is essentially equal to a string (char array).
In addition to Jabberwocky's answer, please note that the code in the question has 2 bugs:
c must be converted to char for the comparison with *src1: strchr("ABC", 'A' + 256) returns a pointer to the string literal unless char has more than 8 bits.
Furthermore, if c converted to a char has the value 0, a pointer to the null terminator should be returned, not NULL as in the posted code.
Here is a modified version:
char *my_strchr(const char *s, int c) {
for (;;) {
if ((char)c == *s) {
return src1;
}
if (*s++ == '\0') {
return NULL;
}
}
}

Explanation needed on pointer arithmetic with an array of pointers to string literals

I'm currently trying to learn C, with some prior experience in Python and
pointer arithmetic is really confusing me right now.
#include <stdio.h>
int main()
{
char *s[] = {"String1", "Literal2", "Pointers3"};
printf("%c", s[1]+1);
return 0;
}
Why does it print m instead of i ?
When I replace the format string with %s it does what I expect and prints out iteral2(Go to the 0th index of the 1st string literal then move 1 memory adress forward and print the rest).
How does this work, why does it print out a seemingly arbitrary character instead of the 1st(or 1th?) index when I use the %c format string.
The %c format specifier expects a character, not a pointer to a character. The expression s[1] evaluates to a pointer to a character, pointing to "Literal2", and the expression s[1]+1 also evaluates to a pointer to a character, pointing to "iteral2".
So, you are passing printf() a pointer to a character, and you are telling it to print a character. So, what is happening is that the pointer is being re-interpreted as a character, and the output is garbage.
If you insert a character into "String1", (making it, say, "String11",) then everything will be moved upwards in memory by one location, so the value of the pointer will be greater by 1, and so it might print n instead of m.
To obtain a character, when all you have is a pointer to a character, you need to dereference the pointer. So, that would be "%c", *(s[1]+1).
#include <stdio.h>
int main() {
const char *s[] = {"String1", "Literal2", "Pointers3"};
printf("%c", s[1][1]);
return 0; }
Also i think you should make s[] constant as it's deprecated.

What's wrong with this example with string literal?

I'm reading an answer from this site which says the following is undefined
char *fubar = "hello world";
*fubar++; // SQUARELY UNDEFINED BEHAVIOUR!
but isn't that fubar++ is done first, which means moving the pointer to e, and *() is then done, which means extract the e out. I know this is supposed to be asked on chat (I'm a kind person) but no one is there so I ask here to attract notice.
The location of the ++ is the key: If it's a suffix (like in this case) then the increment happens after.
Also due to operator precedence you increment the pointer.
So what happens is that the pointer fubar is dereference (resulting in 'h' which is then ignored), and then the pointer variable fubar is incremented to point to the 'e'.
In short: *fubar++ is fine and valid.
If it was (*fubar)++ then it would be undefined behavior, since then it would attempt to increase the first characters of the string. And literal strings in C are arrays of read-only characters, so attempting to modify a character in a literal string would be undefined behavior.
The expression *fubar++ is essentially equal to
char *temporary_variable = fubar;
fubar = fubar + 1;
*temporary_variable; // the result of the whole expression
The code shown is clearly not undefined behaviour, since *fubar++ is somewhat equal to char result; (result = *fubar, fubar++, result), i.e. it increments the pointer, and not the dereferenced value, and the result of the expression is the (dereferenced) value *fubar before the pointer got incremented. *fubar++ actually gives you the character value to which fubar originally points, but you simply make no use of this "result" and ignore it.
Note, however, that the following code does introduce undefined behaviour:
char *fubar = "hello world";
(*fubar)++;
This is because this increments the value to which fubar points and thereby manipulates a string literal -> undefined behaviour.
When replacing the string literal with an character array, then everything is OK again:
int main() {
char test[] = "hello world";
char* fubar = test;
(*fubar)++;
printf("%s\n",fubar);
}
Output:
iello world

Char pointers and the printf function

I was trying to learn pointers and I wrote the following code to print the value of the pointer:
#include <stdio.h>
int main(void) {
char *p = "abc";
printf("%c",*p);
return 0;
}
The output is:
a
however, if I change the above code to:
#include <stdio.h>
int main(void) {
char *p = "abc";
printf(p);
return 0;
}
I get the output:
abc
I don't understand the following 2 things:
why did printf not require a format specifier in the second case? Is printf(pointer_name) enough to print the value of the pointer?
as per my understanding (which is very little), *p points to a contiguous block of memory that contains abc. I expected both outputs to be the same, i.e.
abc
are the different outputs because of the different ways of printing?
Edit 1
Additionally, the following code produces a runtime error. Why so?
#include <stdio.h>
int main(void) {
char *p = "abc";
printf(*p);
return 0;
}
For your first question, the printf function (and family) takes a string as first argument (i.e. a const char *). That string could contain format codes that the printf function will replace with the corresponding argument. The rest of the text is printed as-is, verbatim. And that's what is happening when you pass p as the first argument.
Do note that using printf this way is highly unrecommended, especially if the string is contains input from a user. If the user adds formatting codes in the string, and you don't provide the correct arguments then you will have undefined behavior. It could even lead to security holes.
For your second question, the variable p points to some memory. The expression *p dereferences the pointer to give you a single character, namely the one that p is actually pointing to, which is p[0].
Think of p like this:
+---+ +-----+-----+-----+------+
| p | ---> | 'a' | 'b' | 'c' | '\0' |
+---+ +-----+-----+-----+------+
The variable p doesn't really point to a "string", it only points to some single location in memory, namely the first character in the string "abc". It's the functions using p that treat that memory as a sequence of characters.
Furthermore, constant string literals are actually stored as (read-only) arrays of the number of character in the string plus one for the string terminator.
Also, to help you understand why *p is the same as p[0] you need to know that for any pointer or array p and valid index i, the expressions p[i] is equal to *(p + i). To get the first character, you have index 0, which means you have p[0] which then should be equal to *(p + 0). Adding zero to anything is a no-op, so *(p + 0) is the same as *(p) which is the same as *p. Therefore p[0] is equal to *p.
Regarding your edit (where you do printf(*p)), since *p returns the value of the first "element" pointed to by p (i.e. p[0]) you are passing a single character as the pointer to the format string. This will lead the compiler to convert it to a pointer which is pointing to whatever address has the value of that single character (it doesn't convert the character to a pointer to the character). This address is not a very valid address (in the ASCII alphabet 'a' has the value 97 which is the address where the program will look for the string to print) and you will have undefined behavior.
p is the format string.
char *p = "abc";
printf(p);
is the same as
print("abc");
Doing this is very bad practice because you don't know what the variable
will contain, and if it contains format specifiers, calling printf may do very bad things.
The reason why the first case (with "%c") only printed the first character
is that %c means a byte and *p means the (first) value which p is pointing at.
%s would print the entire string.
char *p = "abc";
printf(p); /* If p is untrusted, bad things will happen, otherwise the string p is written. */
printf("%c", *p); /* print the first byte in the string p */
printf("%s", p); /* print the string p */
why did printf not require a format specifier in the second case? Is printf(pointer_name) enough to print the value of the pointer?
With your code you told printf to use your string as the format string. Meaning your code turned equivalent to printf("abc").
as per my understanding (which is very little), *p points to a contiguous block of memory that contains abc. I expected both outputs to be the same
If you use %c you get a character printed, if you use %s you get a whole string. But if you tell printf to use the string as the format string, then it will do that too.
char *p = "abc";
printf(*p);
This code crashes because the contents of p, the character 'a' is not a pointer to a format string, it is not even a pointer. That code should not even compile without warnings.
You are misunderstanding, indeed when you do
char *p = "Hello";
p points to the starting address where literal "Hello" is stored. This is how you declare pointers. However, when afterwards, you do
*p
it means dereference p and obtain object where p points. In our above example this would yield 'H'. This should clarify your second question.
In case of printf just try
printf("Hello");
which is also fine; this answers your first question because it is effectively the same what you did when passed just p to printf.
Finally to your edit, indeed
printf(*p);
above line is not correct since printf expects const char * and by using *p you are passing it a char - or in other words 'H' assuming our example. Read more what dereferencing means.
why did printf not require a format specifier in the second case? Is printf(pointer_name) enough to print the value of the pointer?
"abc" is your format specifier. That's why it's printing "abc". If the string had contained %, then things would have behaved strangely, but they didn't.
printf("abc"); // Perfectly fine!
why did printf not require a format specifier in the second case? Is printf(pointer_name) enough to print the value of the pointer?
%c is the character conversion specifier. It instructs printf to only print the first byte. If you want it to print the string, use...
printf ("%s", p);
The %s seems redundant, but it can be useful for printing control characters or if you use width specifiers.
The best way to understand this really is to try and print the string abc%def using printf.
The %c format specifier expects a char type, and will output a single char value.
The first parameter to printf must be a const char* (a char* can convert implicitly to a const char*) and points to the start of a string of characters. It stops printing when it encounters a \0 in that string. If there is not a \0 present in that string then the behaviour of that function is undefined. Because "abc" doesn't contain any format specifiers, you don't pass any additional arguments to printf in that case.

Multiple Reference and Dereference in C

Can somebody clealry explain me the concept behind multiple reference and dereference ? why does the following program gives output as 'h' ?
int main()
{
char *ptr = "hello";
printf("%c\n", *&*&*ptr);
getchar();
return 0;
}
and not this , instead it produces 'd' ?
int main()
{
char *ptr = "hello";
printf("%c\n", *&*&ptr);
getchar();
return 0;
}
I read that consecutive use of '*' and '&' cancels each other but this explanation does not provide the reason behind two different outputs generated in above codes?
The first program produces h because &s and *s "cancel" each other: "dereferencing an address of X" gives back the X:
ptr - a pointer to the initial character of "hello" literal
*ptr - dereference of a pointer to the initial character, i.e. the initial character
&*ptr the address of the dereference of a pointer to the initial character, i.e. a pointer to the initial character, i.e. ptr itself
And so on. As you can see, a pair *& brings you back to where you have started, so you can eliminate all such pairs from your dereference / take address expressions. Therefore, your first program's printf is equivalent to
printf("%c\n", *ptr);
The second program has undefined behavior, because a pointer is being passed to printf with the format specifier of %c. If you pass the same expression to %s, the word hello would be printed:
printf("%s\n", *&*&ptr);
Lets go through the important parts of the program:
char *ptr = "hello";
makes a pointer to char which points to the string literal "hello". Now, for the confusing part:
printf("%c\n", *&*&*ptr);
Here, %c expects a char. Let us look into what type *&*&*ptr is. ptr is a char*. Applying the dereference operator(*) gives a char. Applying the address-of operator to this char gives back the char*. This is repeated again, and finally, the * at the end gives us a char, the first character of the string literal "hello", which gets printed.
In the second program, in *&*&ptr, you first apply the & operator, which gives a char**. Applying * on this gives back the char*. This is repeated again and finally , we get a char*. But %c expects a char, not a char*. So, the second program exhibits Undefined Behavior as per the C11 standard (emphasis mine):
7.21.6.1 The fprintf function
[...]
If a conversion specification is invalid, the behavior is undefined.282 If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.
So, basically, anything can happen when you execute the second program. Your program might crash, emit a segmentation-fault, output weird things, or do something else.
BTW, you are right about saying:
I read that consecutive use of '*' and '&' cancels each other
Let's break down what *&*&*ptr actually is, but first, remember that when applying * to a pointer, it gives you what that pointer points to. On the other hand, when applying & to a variable, it gives you the address in memory for that variable.
Now, after getting a steady ground, let's see what you have here:
ptr is a pointer to a char, thus when doing *ptr it gives you the data which ptr points to, in this case, ptr points to a string "hello", however, a char can hold only one char, not a whole string, right? so, it point to the beginning of such a string, which is the first character in it, aka h.
Moving on...*&*&*ptr=*&*&(*ptr) = *&*&('h')= *&*(&'h') = *&*(ptr)=*&(*ptr) = *&('h')= *ptr = 'h'
If you apply the same pattern on the next function, I'm pretty sure you can figure it out.
SUMMARY: read pointers from the right to the left!

Resources